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Preface 


The study of digital communications is an essential element of the undergraduate and 
postgraduate levels of present-day electrical and computer engineering programs. This 
book is appropriate for both levels. 


The introductory chapter is motivational, beginning with a brief history of digital 
communications, and continuing with sections on the communication process, digital 
communications, multiple-access and multiplexing techniques, and the Internet. Four 
themes organize the remaining nine chapters of the book. 

Mathematics of Digital Communications 

The first theme of the book provides a detailed expose of the mathematical underpinnings 
of digital communications, with continuous mathematics aimed at the communication 
channel and interfering signals, and discrete mathematics aimed at the transmitter and 
receiver: 

• Chapter 2, Fourier Analysis of Signals and Systems, lays down the fundamentals for 
the representation of signals and linear time-invariant systems, as well as analog 
modulation theory. 

• Chapter 3, Probability Theory and Bayesian Inference, presents the underlying 
mathematics for dealing with uncertainty and the Bayesian paradigm for 
probabilistic reasoning. 

• Chapter 4, Stochastic Processes, focuses on weakly or wide-sense stationary 
processes, their statistical properties, and their roles in formulating models for 
Poisson, Gaussian, Rayleigh, and Rician distributions. 

• Chapter 5, Information Theory, presents the notions of entropy and mutual 
information for discrete as well continuous random variables, leading to Shannon’s 
celebrated theorems on source coding, channel coding, and information capacity, as 
well as rate-distortion theory. 

From Analog to Digital Communications 

The second theme of the book, covered in Chapter 6, describes how analog waveforms are 
transformed into coded pulses. It addresses the challenge of performing the transformation 
with robustness, bandwidth preservation, or minimal computational complexity. 

Signaling Techniques 

Three chapters address the third theme, each focusing on a specific form of channel 
impairment: 

• In Chapter 7, Signaling over Additive White Gaussian Noise (AWGN) Channels, the 
impairment is the unavoidable presence of channel noise, which is modeled as 
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additive white Gaussian noise (AWGN). This model is well-suited for the signal- 
space diagram , which brings insight into the study of phase-shift keying (PSK), 
quadrature-amplitude modulation (QAM), and frequency-shift keying (FSK) as 
different ways of accommodating the transmission and reception of binary data. 

• In Chapter 8, Signaling over Band-Limited Channels, bandwidth limitation assumes 
center stage, with intersymbol interference (ISI) as the source of channel impairment. 

• Chapter 9, Signaling over Fading Channels, focuses on fading channels in wireless 
communications and the practical challenges they present. The channel impairment 
here is attributed to the multipath phenomenon, so called because the transmitted 
signal reaches the receiver via a multiplicity of paths. 

Error-control Coding 

Chapter 10 addresses the practical issue of reliable communications. To this end, various 
techniques of the feedforward variety are derived therein, so as to satisfy Shannon’s 
celebrated coding theorem. 

Two families of error-correcting codes are studied in the chapter: 

• Legacy (classic) codes, which embody linear block codes, cyclic codes, and 
convolutional codes. Although different in their structural compositions, they look 
to algebraic mathematics as the procedure for approaching the Shannon limit. 

• Probabilistic compound codes, which embody turbo codes and low-density parity- 
check (LDPC) codes. What is remarkable about these two codes is that they both 
approach the Shannon limit with doable computational complexity in a way that was 
not feasible until 1993. The trick behind this powerful information-processing 
capability is the adoption of random codes, the origin of which could be traced to 
Shannon’s 1948 classic paper. 


Analog in Digital Communication 

When we think of digital communications, we must not overlook the fact that such a 
system is of a hybrid nature. The channel across which data are transmitted is analog, 
exemplified by traditional telephone and wireless channels, and many of the sources 
responsible for the generation of data (e.g., speech and video) are of an analog kind. 
Moreover, certain principles of analog modulation theory, namely double sideband- 
suppressed carrier (DSB-SC) and vestigial sideband (VSB) modulation schemes, include 
binary phase-shift keying (PSK) and offset QPSK as special cases, respectively. 

It is with these points in mind that Chapter 2 includes 

• detailed discussion of communication channels as examples of linear systems, 

• analog modulation theory, and 

• phase and group delays. 

Hilbert Transform 

The Hilbert transform, discussed in Chapter 2, plays a key role in the complex 
representation of signals and systems, whereby 

• a band-pass signal, formulated around a sinusoidal carrier, is transformed into an 
equivalent complex low-pass signal; 
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• a band-pass system, be it a linear channel or filter with a midband frequency, is 
transformed into an equivalent complex low-pass system. 

Both transformations are performed without loss of information, and their use changes a 
difficult task into a much simpler one in mathematical terms, suitable for simulation on a 
computer. However, one must accommodate the use of complex variables. 

The Hilbert transform also plays a key role in Chapter 7. In formulating the method of 
orthogonal modulation, we show that one can derive the well-known formulas for the 
noncoherent detection of binary frequency-shift keying (FSK) and differential phase-shift 
keying (DPSK) signals, given unknown phase, in a much simpler manner than following 
traditional approaches that involve the use of Rician distribution. 

Discrete-time Signal Processing 

In Chapter 2, we briefly review finite-direction impulse response (FIR) or tapped-delay 
line (TDL) filters, followed by the discrete Fourier transform ( DFT) and a well-known /ast 
Fourier transform (FFT) algorithm for its computational implementations. FIR filters and 
FFT algorithms feature prominently in: 

• Modeling of the raised-cosine spectrum ( RCS ) and its square-root version 
(SQRCS), which are used in Chapter 8 to mitigate the ISI in band-limited channels; 

• Implementing the Jakes model for fast fading channels, demonstrated in Chapter 9; 

• Using FIR filtering to simplify the mathematical exposition of the most difficult 
form of channel fading, namely, the doubly spread channel (in Chapter 9). 

Another topic of importance in discrete-time signal processing is linear adaptive filtering, 
which appears: 

• In Chapter 6, dealing with differential pulse-code modulation (DPCM), where an 
adaptive predictor constitutes a key functional block in both the transmitter and 
receiver. The motivation here is to preserve channel bandwidth at the expense of 
increased computational complexity. The algorithm described therein is the widely 
used least mean-square ( LMS ) algorithm. 

• In Chapter 7, dealing with the need for synchronizing the receiver to the transmitter, 
where two algorithms are described, one for recursive estimation of the group delay 
(essential for timing recovery) and the other for recursive estimation of the unknown 
carrier phase (essential for carrier recovery). Both algorithms build on the LMS 
principle so as to maintain linear computational complexity. 

Digital Subscriber Lines 

Digital subscriber lines (DSLs), covered in Chapter 8, have established themselves as an 
essential tool for transforming a linear wideband channel, exemplified by the twisted-wire 
pair, into a discrete multitone (DMT) channel that is capable of accommodating data 
transmission at multiple megabits per second. Moreover, the transformation is afforded 
practical reality by exploiting the FFT algorithm, with the inverse FFT used in the 
transmitter and the FFT used in the receiver. 

Diversity Techniques 

As already mentioned, the wireless channel is one of the most challenging media for 
digital communications. The difficulty of reliable data transmission over a wireless 
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channel is attributed to the multipath phenomenon. Three diversity techniques developed 
to get around this practical difficulty are covered in Chapter 9: 

• Diversity on receive, the traditional approach, whereby an array of multiple antennas 
operating independently is deployed at the receiving end of a wireless channel. 

• Diversity on transmit, which operates by deploying two or more independent 
antennas at the transmit end of the wireless channel. 

• Multiple-input multiple-output ( MIMO ) channels, where multiple antennas (again 
operating independently) are deployed at both ends of the wireless channel. 

Among these three forms of diversity, the MIMO channel is naturally the most powerful in 
information-theoretic terms: an advantage gained at the expense of increased 
computational complexity. 

Turbo Codes 

Error-control coding has established itself as the most commonly used technique for 
reliable data transmission over a noisy channel. Among the challenging legacies bestowed 
by Claude Shannon was how to design a code that would closely approach the so-called 
Shannon limit. For over four decades, increasingly more powerful coding algorithms were 
described in the literature; however it was the turbo code that had the honor of closely 
approaching the Shannon limit, and doing so in a computationally feasible manner. 

Turbo codes, together with the associated maximum a posteriori (MAP) decoding 
algorithm , occupy a large portion of Chapter 10, which also includes: 

• Detailed derivation of the MAP algorithm and an illustrative example of how it 
operates; 

• The extrinsic information transfer (EXIT) chart, which provides an experimental 
tool for the design of turbo codes; 

• Turbo equalization, for demonstrating applicability of the turbo principle beyond 
error-control coding. 

Placement of Information Theory 

Typically, information theory is placed just before the chapter on error-control coding. In 
this book, it is introduced early because: 


To elaborate: 

• Chapter 6 presents the relevance of source coding to pulse-code modulation (PCM), 
differential pulse-code modulation (DPCM), and delta modulation. 

• Comparative evaluation of M- ary PSK versus M-ary FSK, done in Chapter 7, 
requires knowledge of Shannon ’s information capacity law. 

• Analysis and design of DSL, presented in Chapter 8, also builds on Shannon’s 
information capacity law. 

• Channel capacity in Shannon’s coding theorem is important to diversity techniques, 
particularly of the MIMO kind, discussed in Chapter 9. 
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Except for Chapter 1, each of the remaining nine chapters offers the following: 

• Illustrative examples are included to strengthen the understanding of a theorem or 
topic in as much detail as possible. Some of the examples are in the form of 
computer experiments. 

• An extensive list of end-of-chapter problems are grouped by section to fit the 
material covered in each chapter. The problems range from relatively easy ones all 
the way to more challenging ones. 

• In addition to the computer-oriented examples, nine computer-oriented experiments 
are included in the end-of-chapter problems. 

The Matlab codes for all of the computer-oriented examples in the text, as well as other 
calculations performed on the computer, are available at www.wiley.com/college/haykin. 


Eleven appendices broaden the scope of the theoretical as well as practical material 
covered in the book: 

• Appendix A, Advanced Probabilistic Models , covers the chi-square distribution, 
log-normal distribution, and Nakagami distribution that includes the Rayleigh 
distribution as a special case and is somewhat similar to the Rician distribution. 
Moreover, an experiment is included therein that demonstrates, in a step-by-step 
manner, how the Nakagami distribution evolves into the log-normal distribution in 
an approximate manner, demonstrating its adaptive capability. 

• Appendix B develops tight bounds on the Q-function. 

• Appendix C discussed the ordinary Bessel function and its modified form. 

• Appendix D describes the method of Lagrange multipliers for solving constrained 
optimization problems. 

• Appendix E derives the formula for the channel capacity of the MIMO channel 
under two scenarios: one that assumes no knowledge of the channel by the 
transmitter, and the other that assumes this knowledge is available to the transmitter 
via a narrowband feedback link. 

• Appendix F discusses the idea of interleaving , which is needed for dealing with 
bursts of interfering signals experienced in wireless communications. 

• Appendix G addresses the peak-to-average power reduction (PAPR) problem, 
which arises in the use of orthogonal frequency-division multiplexing (OFDM) for 
both wireless and DSL applications. 

• Appendix H discusses solid-state nonlinear power amplifiers , which play a critical 
role in the limited life of batteries in wireless communications. 

• Appendix 1 presents a short expose of Monte Carlo integration: a theorem that deals 
with mathematically intractable problems. 

• Appendix J studies maximal-length sequences, also called m-sequences, which are 
used for implementing linear feedback shift registers (LFSRs). An important 
application of maximal-length sequences (viewed as pseudo-random noise) is in 
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designing direct- sequence spread-spectrum communications for code-division 
multiple access (CDMA). 

• Finally, Appendix K provides a useful list of mathematical formulas and functions . 


Typically, the square-root of minus one is denoted by the italic symbol j, and the 
differential operator (used in differentiation as well as integration) is denoted by the italic 
symbol d. In reality, however, both of these terms are operators, each one in its own way: 
it is therefore incorrect to use italic symbols for their notations. Furthermore, italic j and 
italic d are also frequently used as indices or to represent other matters, thereby raising the 
potential for confusion. According, throughout the book, roman j and roman d are used to 
denote the square root of minus one and the differential operator, respectively. 


In writing this book every effort has been made to present the material in the manner 
easiest to read so as to enhance understanding of the topics covered. Moreover, cross- 
references within a chapter as well as from chapter to chapter have been included 
wherever the need calls for it. 

Finally, every effort has been made by the author as well as compositor of the book to 
make it as error-free as humanly possible. In this context, the author would welcome 
receiving notice of any errors discovered after publication of the book. 


In writing this book I have benefited enormously from technical input, persistent support, 
and permissions provided by many. 

I am grateful to colleagues around the world for technical inputs that have made a 
significant difference in the book; in alphabetical order, they are: 

• Dr. Daniel Costello, Jr., University of Notre Dame, for reading and providing useful 
comments on the maximum likelihood decoding and maximum a posteriori 
decoding materials in Chapter 10. 

• Dr. Dimitri Bertsekas, MIT, for permission to use Table 3.1 on the Q-function in 
Chapter 3, taken from his co-authored book on the theory of probability. 

• Dr. Lajos Hanzo, University of Southampton, UK, for many useful comments on 
turbo codes as well as low-density parity-check codes in Chapter 10. I am also 
indebted to him for putting me in touch with his colleagues at the University of 
Southampton, Dr. R. G. Maunder and Dr. L. Li, who were extremely helpfully in 
performing the insightful computer experiments on UMTS-turbo codes and EXIT 
charts in Chapter 10. 

• Dr. Phillip Regalia, Catholic University, Washington DC, for contributing a section 
on serial-concatenated turbo codes in Chapter 10. This section has been edited by 
myself to follow the book’s writing style, and for its inclusion I take full 
responsibility. 

• Dr. Sam Shanmugan, University of Kansas, for his insightful inputs on the use of 
FIR filters and FFT algorithms for modeling the raised-cosine spectrum (RCS) and 
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its square-root version (SQRCS) in Chapter 8, implementing the Jakes model in 
Chapter 9, as well as other simulation-oriented issues. 

• Dr. Yanbo Xue, University of Alberta, Canada, for performing computer-oriented 
experiments and many other graphical computations throughout the book, using 
well-developed Matlab codes. 

• Dr. Q. T. Zhang, The City University of Hong Kong, for reading through an early 
version of the manuscript and offering many valuable suggestions for improving it. 1 
am also grateful to his student, Jiayi Chen, for performing the graphical 
computations on the Nakagami distribution in Appendix A. 

I’d also like to thank the reviewers who read drafts of the manuscript and provided 
valuable commentary: 

• Ender Ayanoglu, University of California, Irvine 

• Tolga M. Duman, Arizona State University 

• Bruce A. Harvey, Florida State University 

• Bing W. Kwan, FAMU-FSU College of Engineering 

• Chung-Chieh Lee, Northwestern University 

• Heung-No Lee, University of Pittsburgh 

• Michael Rice, Brigham Young University 

• James Ritcey, University of Washington 

• Lei Wei, University of Central Florida 

Production of the book would not have been possible without the following: 

• Daniel Sayre, Associate Publisher at John Wiley & Sons, who maintained not only 
his faith in this book but also provided sustained support for it over the past few 
years. In am deeply indebted to Dan for what he has done to make this book a 
reality. 

• Cindy Johnson, Publishing Services, Newburyport, MA, for her dedicated 
commitment to the beautiful layout and composition of the book. I am grateful for 
her tireless efforts to print the book in as errorless manner as humanly possible, 

I salute everyone, and others too many to list, for their individual and collective 
contributions, without which this book would not have been a reality. 

Simon Haykin 
Ancaster, Ontario 
Canada 

December, 2012 
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Introduction 


Historical Background 


In order to provide a sense of motivation, this introductory treatment of digital 
communications begins with a historical background of the subject, brief but succinct as it 
may be. In this first section of the introductory chapter we present some historical notes 
that identify the pioneering contributors to digital communications specifically, focusing 
on three important topics: information theory and coding, the Internet, and wireless 
communications. In their individual ways, these three topics have impacted digital 
communications in revolutionary ways. 


In 1948, the theoretical foundations of digital communications were laid down by Claude 
Shannon in a paper entitled “A mathematical theory of communication.” Shannon’s paper 
was received with immediate and enthusiastic acclaim. It was perhaps this response that 
emboldened Shannon to amend the title of his classic paper to “The mathematical theory 
of communication” when it was reprinted later in a book co-authored with Warren Weaver. 
It is noteworthy that, prior to the publication of Shannon’s 1948 classic paper, it was 
believed that increasing the rate of transmission over a channel would increase the 
probability of error; the communication theory community was taken by surprise when 
Shannon proved that this was not true, provided the transmission rate was below the 
channel capacity. 

Shannon’s 1948 paper was followed by three ground-breaking advances in coding 
theory, which include the following: 

Development of the first nontrivial error-correcting code by Golay in 1949 and 
Hamming in 1950. 

Development of turbo codes by Berrou, Glavieux and Thitimjshima in 1993; turbo 
codes provide near-optimum error-correcting coding and decoding performance in 
additive white Gaussian noise. 

Rediscovery of low-density parity-check (LDPC) codes, which were first described 
by Gallager in 1962; the rediscovery occurred in 1981 when Tanner provided a new 
interpretation of LDPC codes from a graphical perspective. Most importantly, it was 
the discovery of turbo codes in 1993 that reignited interest in LDPC codes. 
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From 1950 to 1970, various studies were made on computer networks. Flowever, the most 
significant of them all in terms of impact on computer communications was the Advanced 
Research Project Agency Network (ARPANET), which was put into service in 1971. The 
development of ARPANET was sponsored by the Advanced Research Projects Agency 
(ARPA) of the United States Department of Defense. The pioneering work in packet 
switching was done on the ARPANET. In 1985, ARPANET was renamed the Internet. 
Flowever, the turning point in the evolution of the Internet occurred in 1990 when Berners- 
Lee proposed a hypermedia software interface to the Internet, which he named the World 
Wide Web. Thereupon, in the space of only about 2 years, the Web went from nonexistence 
to worldwide popularity, culminating in its commercialization in 1994. The Internet has 
dramatically changed the way in which we communicate on a daily basis, using a 
wirelined network. 


In 1864, James Clerk Maxwell formulated the electromagnetic theory of light and 
predicted the existence of radio waves; the set of four equations that connect electric and 
magnetic quantities bears his name. Later on in 1984, Henrich Herz demonstrated the 
existence of radio waves experimentally. 

Flowever, it was on December 12, 1901, that Guglielmo Marconi received a radio 
signal at Signal Hill in Newfoundland; the radio signal had originated in Cornwall, 
England, 2100 miles away across the Atlantic. Last but by no means least, in the early 
days of wireless communications, it was Fessenden, a self-educated academic, who in 
1906 made history by conducting the first radio broadcast, transmitting music and voice 
using a technique that came to be known as amplitude modulation (AM) radio. 

In 1988, the first digital cellular system was introduced in Europe; it was known as the 
Global System for Mobile ( GSM ) Communications. Originally, GSM was intended to 
provide a pan-European standard to replace the myriad of incompatible analog wireless 
communication systems. The introduction of GSM was soon followed by the North 
American IS-54 digital standard. As with the Internet, wireless communication has also 
dramatically changed the way we communicate on a daily basis. 

What we have just described under the three headings, namely, information theory and 
coding, the Internet, and wireless communications, have collectively not only made 
communications essentially digital, but have also changed the world of communications 
and made it global. 

The Communication Process 


Today, communication enters our daily lives in so many different ways that it is very easy 
to overlook the multitude of its facets. The telephones as well as mobile smart phones and 
devices at our hands, the radios and televisions in our living rooms, the computer terminals 
with access to the Internet in our offices and homes, and our newspapers are all capable of 
providing rapid communications from every corner of the globe. Communication provides 
the senses for ships on the high seas, aircraft in flight, and rockets and satellites in space. 
Communication through a wireless telephone keeps a car driver in touch with the office or 
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home miles away, no matter where. Communication provides the means for social 
networks to engage in different ways (texting, speaking, visualizing), whereby people are 
brought together around the world. Communication keeps a weather forecaster informed 
of conditions measured by a multitude of sensors and satellites. Indeed, the list of 
applications involving the use of communication in one way or another is almost endless. 

In the most fundamental sense, communication involves implicitly the transmission of 
information from one point to another through a succession of processes: 

The generation of a message signal - voice, music, picture, or computer data. 

The description of that message signal with a certain measure of precision, using a 
set of symbols - electrical, aural, or visual. 

The encoding of those symbols in a suitable form for transmission over a physical 
medium of interest. 

The transmission of the encoded symbols to the desired destination. 

The decoding and reproduction of the original symbols. 

The re-creation of the original message signal with some definable degradation in 
quality, the degradation being caused by unavoidable imperfections in the system. 

There are, of course, many other forms of communication that do not directly involve the 
human mind in real time. For example, in computer communications involving 
communication between two or more computers, human decisions may enter only in 
setting up the programs or commands for the computer, or in monitoring the results. 

Irrespective of the form of communication process being considered, there are three 
basic elements to every communication system, namely, transmitter, channel, and 
receiver, as depicted in Figure 1.1. The transmitter is located at one point in space, the 
receiver is located at some other point separate from the transmitter, and the channel is the 
physical medium that connects them together as an integrated communication system. The 
purpose of the transmitter is to convert the message signal produced by the source of 
information into a form suitable for transmission over the channel. However, as the 
transmitted signal propagates along the channel, it is distorted due to channel 
imperfections. Moreover, noise and interfering signals (originating from other sources) are 
added to the channel output, with the result that the received signal is a corrupted version 
of the transmitted signal. The receiver has the task of operating on the received signal so 
as to reconstruct a recognizable form of the original message signal for an end user or 
information sink. 
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There are two basic modes of communication: 

Broadcasting, which involves the use of a single powerful transmitter and numerous 
receivers that are relatively inexpensive to build. Here, information-bearing signals 
flow only in one direction. 

Point-to-point communication, in which the communication process takes place over 
a link between a single transmitter and a receiver. In this case, there is usually a 
bidirectional flow of information-bearing signals, which requires the combined use 
of a transmitter and receiver (i.e., a transceiver) at each end of the link. 

The underlying communication process in every communication system, irrespective of its 
kind, is statistical in nature. Indeed, it is for this important reason that much of this book is 
devoted to the statistical underpinnings of digital communication systems. In so doing, we 
develop a wealth of knowledge on the fundamental issues involved in the study of digital 
communications. 

Multiple-Access Techniques 


Continuing with the communication process, multiple-access is a technique whereby 
many subscribers or local stations can share the use of a communication channel at the 
same time or nearly so, despite the fact that their individual transmissions may originate 
from widely different locations. Stated in another way, a multiple-access technique 
permits the communication resources of the channel to be shared by a large number of 
users seeking to communicate with each other. 

There are subtle differences between multiple access and multiplexing that should be 
noted: 

• Multiple access refers to the remote sharing of a communication channel such as a 
satellite or radio channel by users in highly dispersed locations. On the other hand, 
multiplexing refers to the sharing of a channel such as a telephone channel by users 
confined to a local site. 

• In a multiplexed system, user requirements are ordinarily fixed. In contrast, in a 
multiple-access system user requirements can change dynamically with time, in 
which case provisions are necessary for dynamic channel allocation. 

For obvious reasons it is desirable that in a multiple-access system the sharing of resources 
of the channel be accomplished without causing serious interference between users of the 
system. In this context, we may identify four basic types of multiple access: 

Frequency-division multiple access (FDMA). 

In this technique, disjoint subbands of frequencies are allocated to the different users 
on a continuous-time basis. In order to reduce interference between users allocated 
adjacent channel bands, guard bands are used to act as buffer zones, as illustrated in 
Figure 1.2a. These guard bands are necessary because of the impossibility of 
achieving ideal filtering or separating the different users. 

Time-division multiple access (TDM A). 

In this second technique, each user is allocated the full spectral occupancy of the 
channel, but only for a short duration of time called a time slot. As shown in Figure 
1.2b, buffer zones in the form of guard times are inserted between the assigned time 
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slots. This is done to reduce interference between users by allowing for time 
uncertainty that arises due to system imperfections, especially in synchronization 
schemes. 

Code-division multiple access (CDMA). 

In FDMA, the resources of the channel are shared by dividing them along the 
frequency coordinate into disjoint frequency bands, as illustrated in Figure 1.2a. In 
TDMA, the resources are shared by dividing them along the time coordinate into 
disjoint time slots, as illustrated in Figure 1.2b. In Figure 1.2c, we illustrate another 
technique for sharing the channel resources by using a hybrid combination of 
FDMA and TDMA, which represents a specific form of code-division multiple 
access (CDMA). For example, frequency hopping may be employed to ensure that 
during each successive time slot, the frequency bands assigned to the users are 
reordered in an essentially random manner. To be specific, during time slot 1 , user 1 
occupies frequency band 1, user 2 occupies frequency band 2, user 3 occupies 
frequency band 3, and so on. During time slot 2, user 1 hops to frequency band 3, 
user 2 hops to frequency band 1, user 3 hops to frequency band 2, and so on. Such an 
arrangement has the appearance of the users playing a game of musical chairs. An 
important advantage of CDMA over both FDMA and TDMA is that it can provide 
for secure communications. In the type of CDMA illustrated in Figure 1.2c, the 
frequency hopping mechanism can be implemented through the use of a pseudo- 
noise (PN) sequence. 

Space-division multiple access (SDMA). 

In this multiple-access technique, resource allocation is achieved by exploiting the 
spatial separation of the individual users. In particular, multibeam antennas are used 
to separate radio signals by pointing them along different directions. Thus, different 
users are enabled to access the channel simultaneously on the same frequency or in 
the same time slot. 

These multiple-access techniques share a common feature: allocating the communication 
resources of the channel through the use of disjointedness (or orthogonality in a loose 
sense) in time, frequency, or space. 
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Illustrating the ideas behind multiple-access techniques, (a) Frequency-division 
multiple access, (b) Time-division multiple access, (c) Frequency-hop multiple access. 
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Networks 


A communication network or simply network , illustrated in Figure 1.3, consists of an 
interconnection of a number of nodes made up of intelligent processors (e.g., 
microcomputers). The primary purpose of these nodes is to route data through the 
network. Each node has one or more stations attached to it; stations refer to devices 
wishing to communicate. The network is designed to serve as a shared resource for 
moving data exchanged between stations in an efficient manner and also to provide a 
framework to support new applications and services. The traditional telephone network is 
an example of a communication network in which circuit switching is used to provide a 
dedicated communication path or circuit between two stations. The circuit consists of a 
connected sequence of links from source to destination. The links may consist of time 
slots in a time-division multiplexed (TDM) system or frequency slots in a frequency- 
division multiplexed (FDM) system. The circuit, once in place, remains uninterrupted for 
the entire duration of transmission. Circuit switching is usually controlled by a centralized 
hierarchical control mechanism with knowledge of the network’s organization. To 
establish a circuit-switched connection, an available path through the network is seized 
and then dedicated to the exclusive use of the two stations wishing to communicate. In 
particular, a call-request signal must propagate all the way to the destination, and be 
acknowledged, before transmission can begin. Then, the network is effectively transparent 
to the users. This means that, during the connection time, the bandwidth and resources 
allocated to the circuit are essentially “owned” by the two stations, until the circuit is 
disconnected. The circuit thus represents an efficient use of resources only to the extent 
that the allocated bandwidth is properly utilized. Although the telephone network is used 
to transmit data, voice constitutes the bulk of the network’s traffic. Indeed, circuit 
switching is well suited to the transmission of voice signals, since voice conversations 
tend to be of long duration (about 2 min on average) compared with the time required for 
setting up the circuit (about 0. 1-0.5 s). Moreover, in most voice conversations, there is 
information flow for a relatively large percentage of the connection time, which makes 
circuit switching all the more suitable for voice conversations. 



Communication network. 
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In circuit switching, a communication link is shared between the different sessions 
using that link on a fixed allocation basis. In packet switching , on the other hand, the 
sharing is done on a demand basis and, therefore, it has an advantage over circuit 
switching in that when a link has traffic to send, the link may be more fully utilized. 

The basic network principle of packet switching is “store and forward.” Specifically, in 
a packet-switched network , any message larger than a specified size is subdivided prior to 
transmission into segments not exceeding the specified size. The segments are commonly 
referred to as packets. The original message is reassembled at the destination on a packet- 
by-packet basis. The network may be viewed as a distributed pool of network resources 
(i.e., channel bandwidth, buffers, and switching processors) whose capacity is shared 
dynamically by a community of competing users (stations) wishing to communicate. In 
contrast, in a circuit-switched network, resources are dedicated to a pair of stations for the 
entire period they are in session. Accordingly, packet switching is far better suited to a 
computer-communication environment in which “bursts” of data are exchanged between 
stations on an occasional basis. The use of packet switching, however, requires that careful 
control be exercised on user demands; otherwise, the network may be seriously abused. 

The design of a data network (i.e., a network in which the stations are all made up of 
computers and terminals) may proceed in an orderly way by looking at the network in 
terms of a layered architecture , regarded as a hierarchy of nested layers. A layer refers to a 
process or device inside a computer system, designed to perform a specific function. 
Naturally, the designers of a layer will be intimately familiar with its internal details and 
operation. At the system level, however, a user views the layer merely as a “black box” 
that is described in terms of the inputs, the outputs, and the functional relationship 
between outputs and inputs. In a layered architecture, each layer regards the next lower 
layer as one or more black boxes with some given functional specification to be used by 
the given higher layer. Thus, the highly complex communication problem in data networks 
is resolved as a manageable set of well-defined interlocking functions. It is this line of 
reasoning that has led to the development of the open systems interconnection (OSI) 
reference model by a subcommittee of the International Organization for Standardization. 
The term “open” refers to the ability of any two systems conforming to the reference 
model and its associated standards to interconnect. 

In the OSI reference model, the communications and related-connection functions are 
organized as a series of layers or levels with well-defined interfaces, and with each layer 
built on its predecessor. In particular, each layer performs a related subset of primitive 
functions, and it relies on the next lower layer to perform additional primitive functions. 
Moreover, each layer offers certain services to the next higher layer and shields the latter 
from the implementation details of those services. Between each pair of layers, there is an 
interface. It is the interface that defines the services offered by the lower layer to the upper 
layer. 

The OSI model is composed of seven layers, as illustrated in Figure 1.4; this figure also 
includes a description of the functions of the individual layers of the model. Layer k on 
system A, say, communicates with layer k on some other system B in accordance with a set 
of rules and conventions, collectively constituting the layer k protocol, where k - 1 , 2, ..., 
7. (The term “protocol” has been borrowed from common usage, describing conventional 
social behavior between human beings.) The entities that comprise the corresponding 
layers on different systems are referred to as peer processes. In other words. 
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communication is achieved by having the peer processes in two different systems 
communicate via a protocol, with the protocol itself being defined by a set of rules of 
procedure. Physical communication between peer processes exits only at layer 1 . On the 
other hand, layers 2 through 7 are in virtual communication with their distant peers. 
However, each of these six layers can exchange data and control information with its 
neighboring layers (below and above) through layer-to-layer interfaces. In Figure 1.4, 
physical communication is shown by solid lines and virtual communication by dashed 
lines. The major principles involved in arriving at seven layers of the OSI reference model 
are as follows: 

Each layer performs well-defined functions. 

A boundary is created at a point where the description of services offered is small 
and the number of interactions across the boundary is the minimum possible. 

A layer is created from easily localized functions, so that the architecture of the 
model may permit modifications to the layer protocol to reflect changes in 
technology without affecting the other layers. 

A boundary is created at some point with an eye toward standardization of the 
associated interface. 

A layer is created only when a different level of abstraction is needed to handle the data. 
The number of layers employed should be large enough to assign distinct functions to 
different layers, yet small enough to maintain a manageable architecture for the model. 

Note that the OSI reference model is not a network architecture; rather, it is an 
international standard for computer communications, which just tells what each layer 
should do. 

Digital Communications 


Today’s public communication networks are highly complicated systems. Specifically, 
public switched telephone networks (collectively referred to as PSTNs), the Internet, and 
wireless communications (including satellite communications) provide seamless 
connections between cities, across oceans, and between different countries, languages, and 
cultures; hence the reference to the world as a “global village.’’ 

There are three layers of the OSI model where it can affect the design of digital 
communication systems, which is the subject of interest of this book: 

Physical layer. This lowest layer of the OSI model embodies the physical 
mechanism involved in transmitting bits (i.e., binary digits) between any pair of 
nodes in the communication network. Communication between the two nodes is 
accomplished by means of modulation in the transmitter, transmission across the 
channel, and demodulation in the receiver. The module for performing modulation 
and demodulation is often called a modem. 

Data-link layer. Communication links are nearly always corrupted by the 
unavoidable presence of noise and interference. One purpose of the data-link layer, 
therefore, is to perform error correction or detection, although this function is also 
shared with the physical layer. Often, the data-link layer will retransmit packets that 
are received in error but, for some applications, it discards them. This layer is also 
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responsible for the way in which different users share the transmission medium. A 
portion of the data-link layer, called the medium access control (MAC) sublayer, is 
responsible for allowing frames to be sent over the shared transmission media 
without undue interference with other nodes. This aspect is referred to as multiple- 
access communications. 

Network layer. This layer has several functions, one of which is to determine the 
routing of information, to get it from the source to its ultimate destination. A second 
function is to determine the quality of service. A third function is flow control , to 
ensure that the network does not become congested. 

These are three layers of a seven-layer model for the functions that occur in the 
communications process. Although the three layers occupy a subspace within the OSI 
model, the functions that they perform are of critical importance to the model. 


Typically, in the design of a digital communication system the information source, 
communication channel, and information sink (end user) are all specified. The challenge is 
to design the transmitter and the receiver with the following guidelines in mind: 

• Encode/modulate the message signal generated by the source of information, 
transmit it over the channel, and produce an “estimate” of it at the receiver output 
that satisfies the requirements of the end user. 

• Do all of this at an affordable cost. 

In a digital communication system represented by the block diagram of Figure 1 .6, the 
rationale for which is rooted in information theory, the functional blocks of the transmitter 
and the receiver starting from the far end of the channel are paired as follows: 

• source encoder-decoder; 

• channel encoder-decoder; 

• modulator-demodulator. 

The source encoder removes redundant information from the message signal and is 
responsible for efficient use of the channel. The resulting sequence of symbols is called 
the source codeword. The data stream is processed next by the channel encoder, which 
produces a new sequence of symbols called the channel codeword. The channel codeword 
is longer than the source code word by virtue of the controlled redundancy built into its 
construction. Finally, the modulator represents each symbol of the channel codeword by a 
corresponding analog symbol, appropriately selected from a finite set of possible analog 
symbols. The sequence of analog symbols produced by the modulator is called a 
waveform , which is suitable for transmission over the channel. At the receiver, the channel 
output (received signal) is processed in reverse order to that in the transmitter, thereby 
reconstructing a recognizable version of the original message signal. The reconstructed 
message signal is finally delivered to the user of information at the destination. From this 
description it is apparent that the design of a digital communication system is rather 
complex in conceptual terms but easy to build. Moreover, the system is robust , offering 
greater tolerance of physical effects (e.g., temperature variations, aging, mechanical 
vibrations) than its analog counterpart; hence the ever-increasing use of digital 
communications. 
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Block diagram of a digital communication system. 


Organization of the Book 


The main part of the book is organized in ten chapters, which, after this introductory 
chapter, are organized into five parts of varying sizes as summarized herein. 

Mathematical Background 

Chapter 2 presents a detailed treatment of the Fourier transform, its properties and 
algorithmic implementations. This chapter also includes two important related topics: 

• The Hilbert transform, which provides the mathematical basis for transforming 
real-valued band-pass signals and systems into their low-pass equivalent 
representations without loss of information. 

• Overview of analog modulation theory, thereby facilitating an insightful link 
between analog and digital communications. 

Chapter 3 presents a mathematical review of probability theory and Bayesian 
inference, the understanding of which is essential to the study of digital 
communications. 

Chapter 4 is devoted to the study of stochastic processes, the theory of which is 
basic to the characterization of sources of information and communication channels. 
Chapter 5 discusses the fundamental limits of information theory, postulated in 
terms of source coding, channel capacity, and rate-distortion theory. 
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Transition from Analog to Digital Communications 

This material is covered in Chapter 6. Simply put, the study therein discusses the 
different ways in which analog waveforms are converted into digitally encoded 
sequences. 

Signaling Techniques 

This third part of the book includes three chapters: 

• Chapter 7 discusses the different techniques for signaling over additive white 
Gaussian noise (AWGN) channels. 

• Chapter 8 discusses signaling over band-limited channels, as in data transmission 
over telephonic channels and the Internet. 

• Chapter 9 is devoted to signaling over fading channels, as in wireless 
communications. 

Error-Control Coding 

The reliability of data transmission over a communication channel is of profound 
practical importance. Chapter 10 studies the different methods for the encoding of 
message sequences in the transmitter and decoding them in the receiver. Here, we 
cover two classes of error-control coding techniques: 

• classic codes rooted in algebraic mathematics, and 

• new generation of probabilistic compound codes, exemplified by turbo codes and 
LDPC codes. 

Appendices 

Last but by no means least, the book includes appendices to provide back-up 
material for different chapters in the book, as they are needed. 

Notes 


1. For a detailed discussion on communication networks, see the classic book by Tanenbaum, 
entitled Computer Networks (2003). 

2. The OSI reference model was developed by a subcommittee of the International Organization for 
Standardization (ISO) in 1977. For a discussion of the principles involved in arriving at the seven 
layers of the OSI model and a description of the layers themselves, see Tanenbaum (2003). 
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Introduction 


In this study, the representation of signals and systems features prominently. More 
specifically, the Fourier transform plays a key role in this representation. 

The Fourier transform provides the mathematical link between the time-domain 
representation (i.e., waveform) of a signal and its frequency-domain description (i.e., 
spectrum). Most importantly, we can go back and forth between these two descriptions of 
the signal with no loss of information. Indeed, we may invoke a similar transformation in 
the representation of linear systems. In this latter case, the time-domain and frequency- 
domain descriptions of a linear time-invariant system are defined in terms of its impulse 
response and frequency response, respectively. 

In light of this background, it is in order that we begin a mathematical study of 
communication systems by presenting a review of Fourier analysis. This review, in turn, 
paves the way for the formulation of simplified representations of band-pass signals and 
systems to which we resort in subsequent chapters. We begin the study by developing the 
transition from the Fourier series representation of a periodic signal to the Fourier 
transform representation of a nonperiodic signal; this we do in the next two sections. 

The Fourier Series 


Let g T (t) denote a periodic signal, where the subscript 7’ () denotes the duration of 
periodicity. By using a Fourier series expansion of this signal, we are able to resolve it into 
an infinite sum of sine and cosine terms, as shown by 

oo 

g T y) = a 0 + 2 ^ [a n cos(2jtn/ 0 0 + ^„sin(2jtn/ 0 0] 

n = 1 
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where 



is the fundamental frequency. The coefficients a n and b n represent the amplitudes of the 
cosine and sine terms, respectively. The quantity n/ 0 represents the nth harmonic of the 
fundamental frequency /g. Each of the terms cos(2nnf 0 t ) and sin(27t«/ 0 f) is called a basis 
function. These basis functions form an orthogonal set over the interval 7g, in that they 
satisfy three conditions: 


f o' - 

j cos(2nmf 0 t)cos(2nnf 0 t)dt 


Tq/ 2, m = n 
0, m ^ n 


.V 2 

cos(2jtmf 0 r) sin(2jtn/ 0 r)dt = 0, for all m and n 

-V 2 


n 
n 

To determine the coefficient a 0 , we integrate both sides of (2.1) over a complete period. 
We thus find that a 0 is the mean value of the periodic signal g T ( t ) over one period, as 
shown by the time average 


.To / 2 


j sin(2jtOT/Qf)sin(27tn/ 0 t)df 

/O 


Tq/ 2, m = 

0 , m 


1 r r <> /2 

= gT 0 (t)&< 

I 0 J -T 0 /2 0 


To determine the coefficient a n , we multiply both sides of (2.1) by cos(2nnf 0 t ) and 
integrate over the interval —Tq/2 to T 0 1 2. Then, using (2.3) and (2.4), we find that 

l r T ° /2 

a n = 7" St (t)cos(2nnf 0 t)dt, n = 1,2,... 

1 0 J -T 0 /2 0 


Similarly, we find that 


Jo/ 2 


1 r 0 

= — g T (t)sm(2nnf 0 t)dt, n = 1,2,... 

i aJ t /i 0 


l O~T 0 /2 

A basic question that arises at this point is the following: 


To resolve this fundamental issue, we have to show that, for the coefficients a n , and b n 
calculated in accordance with (2.6) to (2.8), this series will indeed converge to g T ( t ). In 
general, for a periodic signal g T ( t ) of arbitrary waveform, there is no guarantee that the 
series of (2.1) will converge to g T ( t ) or that the coefficients oq, a lv and b n will even exist. 
In a rigorous sense, we may say that a periodic signal g T (t) can be expanded in a Fourier 
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series if the signal g T ( t) satisfies the Dirichlet conditions: 

The function g T (t) is single valued within the interval 7 0 . 

The function g T ( t) has at most a finite number of discontinuities in the interval 7 (l . 
The function g T (t) has a finite number of maxima and minima in the interval T 0 . 
The function g T (t) is absolutely integrable; that is, 

.V 2 

\g T (t)\dt<°° 

J ~T 0 / 2' 0 1 

From an engineering perspective, however, it suffices to say that the Dirichlet conditions 
are satisfied by the periodic signals encountered in communication systems. 


The Fourier series of (2.1) can be put into a much simpler and more elegant form with the 
use of complex exponentials. We do this by substituting into (2.1) the exponential forms 
for the cosine and sine, namely: 

cos(2nnf 0 t) = |[exp(j2jtn/ 0 0 + exp(-j2jw/ 0 0] 

sin(27tn/ 0 f) = ^[exp(j2nn/ 0 0 - exp(-j2jt«/ 0 O] 
where j = J~ \ . We thus obtain 


= fl o + X [(«„ -j£„)exp(j27t«/ 0 r) + (a n + jfc„)exp(-j2jin/ 0 0] 

77 = 1 

Let c n denote a complex coefficient related to a n and b n by 


a n J b n ’ 

n > 0 

a 0’ 

n = 0 

a n+) b n’ 

n < 0 


Then, we may simplify (2.9) into 


where 


= X c„exp(j27t«/ 0 r) 


1 r T ° /2 

c n = V\ S T „(Oexp(-j2jin/ 0 Odf, n = 0, +1, +2, ... 

7 o J -V 2 


The series expansion of (2.11) is referred to as the complex exponential Fourier series. 
The c n themselves are called the complex Fourier coefficients. 
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The integral on the right-hand side of (2.12) is said to be an inner product of the signal 
g T (t) with the basis functions exp(-j27tn/ 0 f), by whose linear combination all square 
integrable functions can be expressed as in (2.1 1). 

According to this representation, a periodic signal contains all frequencies (both 
positive and negative) that are harmonically related to the fundamental frequency / 0 . The 
presence of negative frequencies is simply a result of the fact that the mathematical model 
of the signal as described by (2.1 1) requires the use of negative frequencies. Indeed, this 
representation also requires the use of complex-valued basis functions, namely 
exp(j27tn/ 0 0, which have no physical meaning either. The reason for using complex- 
valued basis functions and negative frequency components is merely to provide a compact 
mathematical description of a periodic signal, which is well-suited for both theoretical and 
practical work. 

The Fourier Transform 


In the previous section, we used the Fourier series to represent a periodic signal. We now 
wish to develop a similar representation for a signal git ) that is nonperiodic. In order to do 
this, we first construct a periodic function g T ( t ) of period T {) in such a way that g(t) 
defines exactly one cycle of this periodic function, as illustrated in Figure 2.1. In the limit, 
we let the period 7 q become infinitely large, so that we may express g(t) as 

g(t) = lim g T (t) 

T 0 ^>oo 'o 



(a) 



(b) 

Illustrating the use of an arbitrarily defined function of time to 
construct a periodic waveform, (a) Arbitrarily defined function of time g(t). 
(b) Periodic waveform gr Q (f) based on g(f). 
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Representing the periodic function g T ( t ) in terms of the complex exponential form of the 
Fourier series, we write 


where 


8t 0 (0 = 


I 


c ; exp 


f ]2nnt \ 


V T n 


■J 



d t 


Here, we have purposely replaced/ 0 with II Tq in the exponents. Define 

A/= ± 


fn = 


and 


G(f„) = c n T 0 


We may then go on to modify the original Fourier series representation of g T ( t ) given in 
(2.1 1) into a new form described by 

oo 

g Tg (t) = £ G(f n ) exp (j 2nf n t)Af 

n =-oo 

where 

T /2 

G(f n ) = f ° g T (r)exp(-j2jc/„0 dr 
J -r 0 /2 0 

Equations (2.14) and (2.15) apply to a periodic signal g T (r). What we would like to do 
next is to go one step further and develop a corresponding pair of formulas that apply to a 
nonperiodic signal g(t). To do this transition, we use the defining equation (2.13). 
Specifically, two things happen: 

The discrete frequency f n in (2.14) and (2.15) approaches the continuous frequency 
variable/. 

The discrete sum of (2.14) becomes an integral defining the area under the function 
G(/)exp(j2jt/f), integrated with respect to time t. 

Accordingly, piecing these points together, we may respectively rewrite the limiting forms 
of (2.15) and (2.14) as 


G(f) = J g(t)exp(-}2nft)dt 

— oo 

and 

g(0 = f Gif) exp (j27t/f)d/ 
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In words, we may say: 


Figure 2.2 illustrates the interplay between these two formulas, where we see that the 
frequency-domain description based on (2.16) plays the role of analysis and the time- 
domain description based on (2.17) plays the role of synthesis. 

From a notational point of view, note that in (2.16) and (2.17) we have used a lowercase 
letter to denote the time function and an uppercase letter to denote the corresponding 
frequency function. Note also that these two equations are of identical mathematical form, 
except for changes in the algebraic signs of the exponents. 

For the Fourier transform of a signal g(t) to exist, it is sufficient but not necessary that 
the nonperiodic signal g(t) satisfies three Dirichlet’s conditions of its own: 

The function g(t) is single valued, with a finite number of maxima and minima in 

any finite time interval. 

The function git) has a finite number of discontinuities in any finite time interval. 

The function g(t') is absolutely integrable; that is, 


In practice, we may safely ignore the question of the existence of the Fourier transform of 
a time function g{t) when it is an accurately specified description of a physically realizable 
signal. In other words, physical realizability is a sufficient condition for the existence of a 
Fourier transform. Indeed, we may go one step further and state: 

A signal git) is said to be an energy signal if the condition 



oo 


-oo 



— oo 


holds. 


Analysis equation: 



r 


f 

Frequency-domain 


Time-domain 

description: 

git) 


description: 


C(/) 



Synthesis equation: 


g(t) =J G(/)exp(j 27 i/Od/ 

Sketch of the interplay between the synthesis 
and analysis equations embodied in Fourier transformation. 


The Fourier Transform 
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The Fourier transform provides the mathematical tool for measuring the frequency 
content, or spectrum, of a signal. For this reason, the terms Fourier transform and 
spectrum are used interchangeably. Thus, given a signal git) with Fourier transform Gif), 
we may refer to Gif) as the spectrum of the signal git). By the same token, we refer to 
|G(/)| as the magnitude spectrum of the signal g(t), and refer to arg [Gif)\ as its phase 
spectrum. 

If the signal g(t) is real valued, then the magnitude spectrum of the signal is an even 
function of frequency f while the phase spectrum is an odd function of /. In such a case, 
knowledge of the spectrum of the signal for positive frequencies uniquely defines the 
spectrum for negative frequencies. 


For convenience of presentation, it is customary to express (2.17) in the short-hand form 

Gif) = F[g(f)] 

where F plays the role of an operator. In a corresponding way, (2.18) is expressed in the 
short-hand form 

g(t) = F~'[G(/)] 

where F 1 plays the role of an inverse operator. 

The time function git) and the corresponding frequency function Gif) are said to 
constitute a Fourier-transform pair. To emphasize this point, we write 

git) - Gif) 

where the top arrow indicates the forward transformation from git) to Gif) and the bottom 
arrow indicates the inverse transformation. One other notation: the asterisk is used to 
denote complex conjugation. 


To assist the user of this book, two tables of Fourier transformations are included: 

Table 2.1 on page 23 summarizes the properties of Fourier transforms; proofs of 
them are presented as end-of-chapter problems. 

Table 2.2 on page 24 presents a list of Fourier-transform pairs, where the items 
listed on the left-hand side of the table are time functions and those in the center 
column are their Fourier transforms. 

Binary Sequence for Energy Calculations 

Consider the five-digit binary sequence 10010. This sequence is represented by two 
different waveforms, one based on the rectangular function rect(f), and the other based on 
the sine function sinc(r). Despite this difference, both waveforms are denoted by git), 
which implies they both have exactly the same total energy, to be demonstrated next. 

rect(f) as the basis function. 

Let binary symbol 1 be represented by +rect(f) and binary symbol 0 be represented by 
-rect(r). Accordingly, the binary sequence 10010 is represented by the waveform 
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Binary sequence 10 0 10 


git) 

1.0 







3 

2 



1 

1 

. 

5 

7 

9 

2 

2 

: 

2 

2 

2 


- 1.0 





Waveform of binary sequence 10010, using rect(f) for symbol 1 
and -rect(r) for symbol 0. See Table 2.2 for the definition of rect(f). 


shown in Figure 2.3. From this figure, we readily see that, regardless of the 
representation ±rect(f), each symbol contributes a single unit of energy; hence the total 
energy for Case 1 is five units. 

sinc(f) as the basis fun ction. 

Consider next the representation of symbol 1 by +sinc(f) and the representation of symbol 
0 by -sinc(f), which do not interfere with each other in constructing the waveform for the 
binary sequence 10010. Unfortunately, this time around, it is difficult to calculate the total 
waveform energy in the time domain. To overcome this difficulty, we do the calculation in 
the frequency domain. 

To this end, in parts a and b of Figure 2.4, we display the waveform of the sine function 
in the time domain and its Fourier transform, respectively. On this basis, Figure 2.5 
displays the frequency-domain representation of the binary sequence 10010, with part a of 
the figure displaying the magnitude response |G(/)| , and part b displaying the 
corresponding phrase response arg [G(f)] expressed in radians. Then, applying 
Rayleigh’s energy theorem, described in Property 14 in Table 2.2, to part a of Figure 2.5, 
we readily find that the energy of the pulse, ±sinc(/j, is equal to one unit, regardless of its 
amplitude. The total energy of the sine -based waveform representing the given binary 
sequence is also exactly five units, confirming what was said at the beginning of this 
example. 



G(f) 


1 

2W 


-w o w 


(a) 


(b) 


(a) Sine pulse g(f). (b) Fourier transform G(f). 


~f 
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Binary sequence 10 0 10 


Magnitude 

|G(/)| 

1.0 




1 1 i 1 
1 1 1 1 
1 1 1 1 
1 1 1 1 
1 1 1 1 



1 ^ 0 g 3 5 7 9 

2 2 2 2 2 2 



► Frequency, Hz 

(a) Magnitude spectrum of the sequence 10010. (b) Phase spectrum 
of the sequence. 


Observations 

The dual basis functions, rect(f) and sinc(f), are dilated to their simplest forms, each 
of which has an energy of one unit, hence the equality of the results presented under 
Cases 1 and 2. 

Examining the waveform g{t) in Figure 2.3, we clearly see the discrimination 
between binary symbols 1 and 0. On the other hand, it is the phase response 
arg[G(/)] in part b of Figure 2.5 that shows the discrimination between binary 
symbols 1 and 0. 

Unit Gaussian Pulse 

Typically, a pulse signal gif) and its Fourier transform G(f) have different mathematical 
forms. This observation is illustrated by the Fourier-transform pair studied in Example 1 . 
In this second example, we consider an exception to this observation. In particular, we use 
the differentiation property of the Fourier transform to derive the particular form of a pulse 
signal that has the same mathematical form as its own Fourier transform. 

Fet g(t) denote the pulse signal expressed as a function of time t and Gif) denote its 
Fourier transform. Differentiating the Fourier transform formula of (2.6) with respect to 
frequency / yields 

-j2jc tg(t) - ffiif) 

2ntg(t) ^ Ac(f) 


or, equivalently, 
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Use of the Fourier-transform property on differentiation in the time domain listed in Table 
2.1 yields 

- j2 nfG(f) 

Suppose we now impose the equality condition on the left-hand sides of (2.19) and (2.20): 

= 2ntg(t) 

Then, in a corresponding way, it follows that the right-hand sides of these two equations 
must (after canceling the common multiplying factor j) satisfy the condition 

= 2nfG(f) 

Equations (2.21) and (2.22) show that the pulse signal g(t) and its Fourier transform G(f) 
have exactly the same mathematical form. In other words, provided that the pulse signal 
git ) satisfies the differential equation (2.21), then Gif) = g{f), where g{f) is obtained from 
g(t) simply by substituting /for t. Solving (2.21) for git), we obtain 

g{t) = exp(-7tr 2 ) 

which has a bell-shaped waveform, as illustrated in Figure 2.6. Such a pulse is called a 
Gaussian pulse, the name of which follows from the similarity of the function git) to the 
Gaussian probability density function of probability theory, to be discussed in Chapter 3. 
By applying the Fourier-transform property on the area under git) listed in Table 2.1, we 
have 

J exp(-jtr)dr = 1 

— oo 

When the central ordinate and the area under the curve of a pulse are both unity, as in 

(2.23) and (2.24), we say that the Gaussian pulse is a unit pulse. Therefore, we may state 

that the unit Gaussian pulse is its own Fourier transform, as shown by 

2 2 

exp(-7tr ) -s— exp(-7t/ ) 



Gaussian pulse. 


The Fourier Transform 
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Fourier-transform theorems 


Linearity 

agft) + bg 2 (t) ^ aG l (f) + bG 2 (f) 
where a and b are constants 

Dilation 

g(at) ^ where a is a constant 

\a\ \a; 

Duality 

If g(t) ^ G(f) , 
then G(t) ^ g(-f) 

Time shifting 

g(t~t 0 ) ^ G(/)exp(-j27t/f 0 ) 

Frequency shifting 

g(r)exp(-j27i/ 0 r) ^ Gif-ff) 

Area under g(t ) 

J g{t)dt = G(0) 

Area under Gif) 

g(0) = J G(f)df 

Differentiation in the time domain 

j t g(D - j2« fGif) 

Integration in the time domain 

L* <T)dr;± i2«/ GW + G 2 0) ' 5t/) 

Conjugate functions 

Ifg(r) ^ Gif) , 
then g\t) ^ G* i~f) 

Multiplication in the time domain 

gft)g 2 (t) ^ j GfA)G 2 if-A)dA 

Convolution in the time domain 

( g x {T)g 2 {t-T)dT ^ G l (f)G 2 (f) 

Correlation theorem 

f g^glit-T) dr ^ G l if)G* 2 (f) 

Rayleigh’s energy theorem 

f l«(0| 2 d? = J“ \G(f)\ 2 df 

Parse val’s power theorem for 
periodic signal of period 7’ () 

1 f V2 2 - 2 

7-J 1^(01 dt = ^ ! G Cf„)|L f n = »/r 0 

i o J -r 0 /2 
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Fourier-transform pairs and commonly used time functions 



sinc(2lD) 

exp(-at)u(t), a>0 
exp(-a|?|), a> 0 
exp(-7U~) 


T sine (JT) 


Unit step function: 



1 

a+j2ji/ 

2 a 

a 2 + (2 k ff 
exp(-7t/“) 


u(t) = 


1, t> 0 
t = ° 
0, t < 0 


Dirac delta function: 
5(t) = 0 for t± 0 and 

r°° 

J 8(t)dt = 1 


1-14, \ t \ <T 

T 1 1 

0, |f| > T 

S(t) 

1 

ex P(j2? zf c t) 

cos(2jt/ c r) 

sin(2jt/ c 0 

sgn(f) 

J_ 

n t 

u(t) 

00 

S 3(t-iT 0 ) 


T sine 2 (/T) 

1 

5(f) 

exp(-j27t/r 0 ) 

s(f-f c ) 

\(8(f-f c ) + 5(f+f c )] 

\(8(I-f c )-5(f+f c )} 

J_ 

M 

-j sgn (f) 


Rectangular function: 


rect(f) = 


1, 

0, 


~ 2 <t ~ 


1 

2 


otherwise 


Signum function: 


sgn(f) = • 


+1, 

t > 0 

0, 

t = 0 

-1, 

t< 0 


Sine function: 


sinc(f) = 


sin(7if) 
71 1 


Gaussian function: 
g(0 = exp(-7tr 2 ) 


2 5{f)+ )2Kf 


* 1 

fo^Xf-nfo). f 0 = jr 
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The Inverse Relationship between Time-Domain and 
Frequency-Domain Representations 


The time-domain and frequency-domain descriptions of a signal are inversely related. In 
this context, we may make four important statements: 

If the time-domain description of a signal is changed, the frequency-domain 
description of the signal is changed in an inverse manner, and vice versa. This 
inverse relationship prevents arbitrary specifications of a signal in both domains. In 
other words: 


If a signal is strictly limited in frequency, then the time-domain description of the 
signal will trail on indefinitely, even though its amplitude may assume a 
progressively smaller value. To be specific, we say: 


Consider, for example, the band-limited sine pulse defined by 

• ,a sin(7tr) 
sinc(f) = — - — 
nt 

whose waveform and spectrum are respectively shown in Figure 2.4: part a shows 
that the sine pulse is asymptotically limited in time and part b of the figure shows 
that the sine pulse is indeed strictly band limited, thereby confirming statement 2. 

In a dual manner to statement 2, we say: 


This third statement is exemplified by a rectangular pulse, the waveform and 
spectrum of which are defined in accordance with item 1 in Table 2.2. 

In light of the duality described under statements 2 and 3, we now make the final 
statement: 


The statements we have just made have an important bearing on the bandwidth of a signal, 
which provides a measure of the extent of significant spectral content of the signal for 
positive frequencies. When the signal is strictly band limited, the bandwidth is well 
defined. For example, the sine pulse sinc(2VV'f) has a bandwidth equal to W. However, 
when the signal is not strictly band limited, as is often the case, we encounter difficulty in 
defining the bandwidth of the signal. The difficulty arises because the meaning of 
“significant” attached to the spectral content of the signal is mathematically imprecise. 
Consequently, there is no universally accepted definition of bandwidth. It is in this sense 
that we speak of the “bandwidth dilemma.” 
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Nevertheless, there are some commonly used definitions for bandwidth, as discussed 
next. When the spectrum of a signal is symmetric with a main lobe bounded by well- 
defined nulls (i.e., frequencies at which the spectrum is zero), we may use the main lobe as 
the basis for defining the bandwidth of the signal. Specifically: 


For example, a rectangular pulse of duration 7 seconds has a main spectral lobe of total 
width (2/7) hertz centered at the origin. Accordingly, we may define the bandwidth of this 
rectangular pulse as (1/7) hertz. 

If, on the other hand, the signal is band-pass with main spectral lobes centered around 
±/ c , where f c is large enough, the bandwidth is defined as the width of the main lobe for 
positive frequencies. This definition of bandwidth is called the null-to-null bandwidth. 
Consider, for example, a radio-frequency (RF) pulse of duration 7 seconds and frequency 
/ c , shown in Figure 2.7. The spectrum of this pulse has main spectral lobes of width (2/7) 
hertz centered around ±/ c , where it is assumed that/ c is large compared with (1/7). Hence, 
we define the null-to-null bandwidth of the RF pulse of Figure 2.7 as (2/7) hertz. 

On the basis of the definitions presented here, we may state that shifting the spectral 
content of a low-pass signal by a sufficiently large frequency has the effect of doubling the 
bandwidth of the signal; this frequency translation is attained by using the process of 
modulation. Basically, the modulation moves the spectral content of the signal for negative 
frequencies into the positive frequency region, whereupon the negative frequencies 
become physically measurable. 

Another popular definition of bandwidth is the 3 dB bandwidth. Specifically, if the 
signal is low-pass, we say: 



Magnitude spectrum of the RF pulse, showing the null-to-null bandwidth to be 2/7, 
centered on the mid-band frequency f c . 
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For example, the decaying exponential function exp (-at) has a 3 dB bandwidth of ( a/2n ) 
hertz. 

If, on the other hand, the signal is of a band-pass kind, centered at ±/ c , the 3 dB 
bandwidth is defined as the separation (along the positive frequency axis) between the two 
frequencies at which the magnitude spectrum of the signal drops to 1/^/2 of its peak value 

at f c . 

Regardless of whether we have a low-pass or band-pass signal, the 3 dB bandwidth has 
the advantage that it can be read directly from a plot of the magnitude spectrum. However, 
it has the disadvantage that it may be misleading if the magnitude spectrum has slowly 
decreasing tails. 


For any family of pulse signals that differ by a time-scaling factor, the product of the 
signal’s duration and its bandwidth is always a constant, as shown by 


This product is called the time-banclwidth product. The constancy of the time-bandwidth 
product is another manifestation of the inverse relationship that exists between the time- 
domain and frequency-domain descriptions of a signal. In particular, if the duration of a 
pulse signal is decreased by reducing the time scale by a factor a, the frequency scale of 
the signal’s spectrum, and therefore the bandwidth of the signal is increased by the same 
factor a. This statement follows from the dilation property of the Fourier transform 
(defined in Property 2 of Table 2.1). The time-bandwidth product of the signal is therefore 
maintained constant. For example, a rectangular pulse of duration T seconds has a 
bandwidth (defined on the basis of the positive-frequency part of the main lobe) equal to 
(1/7) hertz; in this example, the time-bandwidth product of the pulse equals unity. 

The important point to take from this discussion is that whatever definitions we use for 
the bandwidth and duration of a signal, the time-bandwidth product remains constant over 
certain classes of pulse signals; the choice of particular definitions for bandwidth and 
duration merely change the value of the constant. 


To put matters pertaining to the bandwidth and duration of a signal on a firm mathematical 
basis, we first introduce the following definition for bandwidth: 


To be specific, we assume that the signal g(t) is of a low-pass kind, in which case the 
second moment is taken about the origin /= 0. The squared magnitude spectrum of the 
signal is denoted by \G(f)\ 2 . To formulate a nonnegative function, the total area under 
whose curve is unity, we use the normalizing function 
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We thus mathematically define the rms bandwidth of a low-pass signal g(t) with Fourier 
transform G(f) as 


W„ 


f /|G(/)|V 


j" |G(/)| 2 d/ 


v 


1/2 


which describes the dispersion of the spectrum G(f) around /= 0. An attractive feature of 
the rms bandwidth tV rms is that it lends itself readily to mathematical evaluation. But, it is 
not as easily measurable in the laboratory. 

In a manner corresponding to the rms bandwidth, the rms duration of the signal g(t) is 
mathematically defined by 


2 2 ^ 

t L?(0l d t 


1/2 


J ig(or 


dr 


v ■>_ 


where it is assumed that the signal g(t) is centered around the origin t = 0. In Problem 2.7, 
it is shown that, using the rms definitions of (2.26) and (2.27), the time-bandwidth product 
takes the form 


T W > — 

rms rms 4^ 


In Problem 2.7, it is also shown that the Gaussian pulse exp(-7tf ) satisfies this condition 
exactly with the equality sign. 


The Dirac Delta Function 


Strictly speaking, the theory of the Fourier transform, presented in Section 2.3, is 
applicable only to time functions that satisfy the Dirichlet conditions. As mentioned 
previously, such functions naturally include energy signals. However, it would be highly 
desirable to extend this theory in two ways: 

To combine the Fourier series and Fourier transform into a unified theory, so that the 
Fourier series may be treated as a special case of the Fourier transform. 

To include power signals in the list of signals to which we may apply the Fourier 
transform. A signal g(t) is said to be a power signal if the condition 

1 r T/2 2 

~f lg(0l df<°o 

1 -772 

holds, where T is the observation interval. 

It turns out that both of these objectives can be met through the “proper use” of the Dirac 
delta function, or unit impulse. 


The Dirac Delta Function 
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The Dirac delta function or just delta function, denoted by Sit), is defined as having 
zero amplitude everywhere except at t = 0, where it is infinitely large in such a way that it 
contains unit area under its curve; that is, 

S(t) = 0 , 0 

and 

oo 

J S(t)dt = 1 

— oo 

An implication of this pair of relations is that the delta function Sit) is an even function of 
time t, centered at the origin 1 = 0. Perhaps, the simplest way of describing the Dirac delta 
function is to view it as the rectangular pulse 



whose duration is T and amplitude is 1 IT, as illustrated in Figure 2.8. As T approaches 
zero, the rectangular pulse g(t) approaches the Dirac delta function Sit) in the limit. 

For the delta function to have meaning, however, it has to appear as a factor in the 
integrand of an integral with respect to time, and then, strictly speaking, only when the 
other factor in the integrand is a continuous function of time. Let g(t) be such a function, 
and consider the product of git) and the time-shifted delta function Sit - f 0 ). In light of the 
two defining equations (2.29) and (2.30), we may express the integral of this product as 

J g(t)<5(.t-t 0 )dt = g(t Q ) 

—oo 

The operation indicated on the left-hand side of this equation sifts out the value g(t 0 ) of the 
function g(t) at time t = t () , where -oo< t < °° . Accordingly, (2.31) is referred to as the 
sifting property of the delta function. This property is sometimes used as the defining 
equation of a delta function; in effect, it incorporates (2.29) and (2.30) into a single 
relation. 

Noting that the delta function Sit) is an even function of f, we may rewrite (2.31) so as 
to emphasize its resemblance to the convolution integral, as shown by 

J g(f)S(t-f)dT = g(t) 



Illustrative example of the Dirac delta function as the 
limiting form of rectangular pulse rect ( as T approaches zero. 
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In words, the convolution of any function with the delta function leaves that function 
unchanged. We refer to this statement as the replication property of the delta function. 

It is important to realize that no function in the ordinary sense has the two properties of 
(2.29) and (2.30) or the equivalent sifting property of (2.31). However, we can imagine a 
sequence of functions that have progressively taller and thinner peaks at t = 0, with the 
area under the curve consistently remaining equal to unity; as this progression is being 
performed, the value of the function tends to zero at every point except t = 0, where it 
tends to infinity, as illustrated in Figure 2.8, for example. We may therefore say: 


It is immaterial what sort of pulse shape is used, so long as it is symmetric with respect to 
the origin; this symmetry is needed to maintain the “even” function property of the delta 
function. 

Two other points are noteworthy: 

Applicability of the delta function is not confined to the time domain. Rather, it can 
equally well be applied in the frequency domain; all that we have to do is to replace 
time t by frequency /in the defining equations (2.29) and (2.30). 

The area covered by the delta function defines its “strength.” As such, the units, in 
terms of which the strength is measured, are determined by the specifications of the 
two coordinates that define the delta function. 


The Sine Function as a Limiting Form of the Delta Function 
in the Time Domain 

As another illustrative example, consider the scaled sine function 2Wsinc(2Wf), whose 
waveform covers an area equal to unity for all W. 

Figure 2.9 displays the evolution of this time function toward the delta function as the 
parameter W is varied in three stages: W = 1 , W = 2, and W = 5. Referring back to Figure 
2.4, we may infer that as the parameter W characterizing the sine pulse is increased, the 
amplitude of the pulse at time t = 0 increases linearly, while at the same time the duration 
of the main lobe of the pulse decreases inversely. With this objective in mind, as the 
parameter W is progressively increased. Figure 2.9 teaches us two important things: 

The scaled sine function becomes more like a delta function. 

The constancy of the function’s spectrum is maintained at unity across an 
increasingly wider frequency band, in accordance with the constraint that the area 
under the function is to remain constant at unity; see Property 6 of Table 2.1 for a 
validation of this point. 

Based on the trend exhibited in Figure 2.9, we may write 

8{t) = lim 2 W sinc(2Wf) 

W — > °o 

which, in addition to the rectangular pulse considered in Figure 2.8, is another way of 
realizing a delta function in the time domain. 
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Evolution of the sine function 21V sinc(21V?) toward the delta function as the 
parameter W progressively increases. 
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Evolution of the Sum of Complex Exponentials toward the Delta Function in 
the Frequency Domain 

For yet another entirely different example, consider the infinite summation term 

oo 

y exp(j2jt mf) over the interval -1/2 </< 1/2 . Using Euler’s formula 

ffl — — oo 

exp(j2jt mf) = cos(2nmf) + j sin(2nmf) 
we may express the given summation as 


oo 

y ex P(j2Jtm./) 

m = — oo 


X 


m = -oo 


oo 

cos(2jtw/) + j y 


sin(27tm/) 


m = — oo 


The imaginary part of the summation is zero for two reasons. First, &m(2.Kmf) is zero for 
m = 0. Second, since sin(-27t777/) = -sin(27t/7i/), the remaining imaginary terms cancel 
each other. Therefore, 


y exp (j 271777/) 

m = -oo 


oo 

y COS (27t/77 f) 

m = -oo 


Figure 2.10 plots this real- valued summation versus frequency / over the interval 
-1/2 </< 1/2 for three ranges of m: 


-5 < 777 < 5 
-10< 777 < 10 
-20 <m< 20 


Building on the results exhibited in Figure 2.10, we may go on to say 
S(f) = y cos(27t7 nf), -^</<i 

m = -oo 


which is one way of realizing a delta function in the frequency domain. Note that the area 
under the summation term on the right-hand side of (2.34) is equal to unity; we say so 
because 


1/2 OO oo 1/2 

J y cos(2nmf)df= y J cos(2nmf) df 


1/2 

1/2 


X 


m = — oo 


sin(27t/?7/) ~| 1/2 
- 271777 J/= -t/2 


= X 


sin (7t77i) ' 
Km 


_ f 1 for 777 = 0 

[ 0 otherwise 

This result, formulated in the frequency domain, confirms (2.34) as one way of defining 
the delta function S( f) . 
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Evolution of the sum of m complex exponentials toward a delta function in the 
frequency domain as m becomes increasingly larger. 
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Fourier Transforms of Periodic Signals 


We began the study of Fourier analysis by reviewing the Fourier series expansion of 
periodic signals, which, in turn, paved the way for the formulation of the Fourier 
transform. Now that we have equipped ourselves with the Dirac delta function, we would 
like to revisit the Fourier series and show that it can indeed be treated as a special case of 
the Fourier transform. 

To this end, let g(t) be a pulse-like function, which equals a periodic signal g T it) over 
one period 7’ () of the signal and is zero elsewhere, as shown by 


g(t) 


T T 


0, 


elsewhere 


The periodic signal g T ( t ) itself may be expressed in terms of the function git) as an 
infinite summation, as shown by 

Sr 0 W = £ S(t~mT 0 ) 

m = — oo 


In light of the definition of the pulselike function g(t) in (2.35), we may view this function 
as a generating function, so called as it generates the periodic signal g T (f) in accordance 
with (2.36). 

Clearly, the generating function git) is Fourier transformable; let G(f) denote its 
Fourier transform. Correspondingly, let G T (f) denote the Fourier transform of the 
periodic signal g T (t) . Hence, taking the Fourier transforms of both sides of (2.36) and 
applying the time-shifting property of the Fourier transform (Property 4 of Table 2.1), we 
may write 


G T 0 (f) = G( S) ^ ex P(-j2jtm/T 0 ), -oo</<oo 


where we have taken Gif) outside the summation because it is independent of m. 
In Example 4, we showed that 

oo oo 11 

^ exp(j2jtm/) = ^ cos (j 2nmf) = 5(f), -- </ < - 

m = — oo m = — oo 

Let this result be expanded to cover the entire frequency range, as shown by 


^ ex P(j 2nmf) = -<» </< < 


m = -oo n = -oo 

Equation (2.38) (see Problem 2.8c) represents a Dirac comb, consisting of an infinite 
sequence of uniformly spaced delta functions, as depicted in Figure 2.11. 
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(b) 

(a) Dirac comb, (b) Spectrum of the Dirac comb. 


Next, introducing the frequency-scaling factor / 0 = l/7’ () into (2.38), we 

correspondingly write 


X exp (j 27t/7J /T 0 ) = / 0 X S(f-nf 0 ), -oo</<oo 
m = -oo n = — oo 

Hence, substituting (2.39) into the right-hand side of (2.37), we get 


G T (.f) =f 0 G(fi X S ^~ n fo) 


= / 0 X o 

72 = -oo 

where = 7i/ 0 . 

What we have to show next is that the inverse Fourier transform of G T (f) defined in (2.40) 
is exactly the same as in the Fourier series formula of (2.14). Specifically, substituting (2.40) 
into the inverse Fourier transform formula of (2. 17), we get 


8 t 0 (0 ~ /oj 


X G(f n )S(f-f n ) 


exp(j27t/7) d f 
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Interchanging the order of summation and integration, and then invoking the sifting 
property of the Dirac delta function (this time in the frequency domain), we may go on to 
write 


=/o £ f G(/; j )exp(j27t/;)<5(/-/; ( )d/ 

— oo 


= /o X G (/»)ex P(j2Jt/ ;i 0 

n = -oo 

which is an exact rewrite of (2.14) with f (j = A/. Equivalently, in light of (2.36), we may 
formulate the Fourier transform pair 


£ git-mT 0 ) = / 0 ^ G (/’»> ex P(j 

m = -oo n = -oo 

The result derived in (2.41) is one form of Poisson’s sum formula. 

We have thus demonstrated that the Fourier series representation of a periodic signal is 
embodied in the Fourier transformation of (2.16) and (2.17), provided, of course, we 
permit the use of the Dirac delta function. In so doing, we have closed the “circle” by 
going from the Fourier series to the Fourier transform, and then back to the Fourier series. 


Consider a Fourier transformable pulselike signal g(t) with its Fourier transform denoted 
by G(f). Setting/,, = n/ 0 in (2.41) and using (2.38), we may express Poisson’s sum formula 

£ g(t-mT 0 ) ^/ 0 £ G(nf () )S{f-nf 0 ) 

m = —oo n = oo 

where /o = l/7 0 . The summation on the left-hand side of this Fourier- transform pair is a 
periodic signal with period T 0 . The summation on the right-hand side of the pair is a 
uniformly sampled version of the spectrum Gif). We may therefore make the following 
statement: 


Applying the duality property of the Fourier transform (Property 3 of Table 2.1) to (2.42), 
we may also write 

oo oo 

m = -oo n = -oo 

in light of which we may make the following dual statement: 
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Transmission of Signals through Linear Time-Invariant Systems 


A system refers to any physical entity that produces an output signal in response to an 
input signal. It is customary to refer to the input signal as the excitation and to the output 
signal as the response. In a linear system, the principle of superposition holds; that is, the 
response of a linear system to a number of excitations applied simultaneously is equal to 
the sum of the responses of the system when each excitation is applied individually. 

In the time domain, a linear system is usually described in terms of its impulse 
response, which is formally defined as follows: 


If the system is also time invariant, then the shape of the impulse response is the same no 
matter when the unit impulse is applied to the system. Thus, with the unit impulse or delta 
function applied to the system at time t = 0, the impulse response of a linear time-invariant 
system is denoted by h(t). 

Suppose that a system described by the impulse response h(t) is subjected to an 
arbitrary excitation x(t), as depicted in Figure 2.12. The resulting response of the system 
y(t), is defined in terms of the impulse response hit) by 

oo 

y(t) = | x(f)h(t - f) dr 

— oo 

which is called the convolution integral. Equivalently, we may write 

oo 

y(t) = J h(f)x{t-T) dr 

— oo 

Equations (2.44) and (2.45) state that convolution is commutative. 

Examining the convolution integral of (2.44), we see that three different time scales are 
involved: excitation time T, response time t, and system-memory time t - T. This relation is 
the basis of time-domain analysis of linear time-invariant systems. According to (2.44), 
the present value of the response of a linear time-invariant system is an integral over the 
past history of the input signal, weighted according to the impulse response of the system. 
Thus, the impulse response acts as a memory function of the system. 


A linear system with impulse response h(t) is said to be causal if its impulse response h(t) 
satisfies the condition 

h(t) = 0 for f<0 


Excitation 
xtf) ■ 


Linear system: 
impulse response 
Kt) 


Response 

y(t) 


Illustrating the roles of excitation x(t), impulse response hit), 
and response y(t) in the context of a linear time-invariant system. 
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The essence of causality is that no response can appear at the output of the system before 
an excitation is applied to its input. Causality is a necessary requirement for on-line 
operation of the system. In other words, for a system operating in real time to be 
physically realizable, it has to be causal. 

Another important property of a linear system is stability. A necessary and sufficient 
condition for the system to be stable is that its impulse response hit) must satisfy the 
inequality 

J |/i(f)|df<°° 

This requirement follows from the commonly used criterion of bounded input-bounded 
output. Basically, for the system to be stable, its impulse response must be absolutely 
integrable. 


Let Xif), H{f), and Yif) denote the Fourier transforms of the excitation x(t), impulse 
response hit), and response yif), respectively. Then, applying Property 12 of the Fourier 
transform in Table 2.1 to the convolution integral, be it written in the form of (2.44) or 
(2.45), we get 

Yif) = Hif)Xif) 


Equivalently, we may write 


Hif) 


m 

m 


The new frequency function Hif) is called the transfer function or frequency response of 
the system; these two terms are used interchangeably. Based on (2.47), we may now 
formally say: 


In general, the frequency response Hif) is a complex quantity, so we may express it in the form 

Hif) = \H(f)\ exp [j /?(/)] 

where |//(/)| is called the magnitude response, and /?(/) is the phase response, or simply 
phase. When the impulse response of the system is real valued, the frequency response 
exhibits conjugate symmetry, which means that 

\Hif)\ = \Hi-f)\ 

and 

P(f) = 

That is, the magnitude response \H(f)\ of a linear system with real-valued impulse 
response is an even function of frequency, whereas the phase /?(/) is an odd function of 
frequency. 

In some applications it is preferable to work with the logarithm of Hff) expressed in 
polar form, rather than with Hif) itself. Using In to denote the natural logarithm, let 

In H(f) = aif) +}fiif) 
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where 

«(/) = In \H(f)\ 

The function a(f) is called the gain of the system; it is measured in nepers. The phase [3( f) 
is measured in radians. Equation (2.49) indicates that the gain a(f) and phase /3(f) are, 
respectively, the real and imaginary parts of the (natural) logarithm of the transfer function 
H(f. The gain may also be expressed in decibels (dB) by using the definition 

of if = 201og 10 |ff(/)| 

The two gain functions a(f and of (J) are related by 

off) = 8.69 «(/) 

That is, 1 neper is equal to 8.69 dB. 

As a means of specifying the constancy of the magnitude response \H(f\ or gain a(f) 
of a system, we use the notion of bandwidth. In the case of a low-pass system, the 
bandwidth is customarily defined as the frequency at which the magnitude response |//(/)| 
is I / Jl times its value at zero frequency or, equivalently, the frequency at which the gain 
of (J) drops by 3 dB below its value at zero frequency, as illustrated in Figure 2.13a. In the 
case of a band-pass system, the bandwidth is defined as the range of frequencies over 
which the magnitude response \H(f)\ remains within 1/ J2 times its value at the mid-band 
frequency, as illustrated in Figure 2.13b. 



(a) 



Illustrating the definition of system bandwidth, fa) Low-pass system, 
(b) Band-pass system. 
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A necessary and sufficient condition for a function C/.if) to be the gain of a causal filter is 
the convergence of the integral 

f 

-“1 +f 

This condition is known as the Paley-Wiener criterion. The criterion states that provided 
the gain a(f) satisfies the condition of (2.51), then we may associate with this gain a 
suitable phase /3(f), such that the resulting filter has a causal impulse response that is zero 
for negative time. In other words, the Paley-Wiener criterion is the frequency-domain 
equivalent of the causality requirement. A system with a realizable gain characteristic may 
have infinite attenuation for a discrete set of frequencies, but it cannot have infinite 
attenuation over a band of frequencies; otherwise, the Paley-Wiener criterion is violated. 


Consider next a linear time-invariant filter with impulse response h(t). We make two 
assumptions: 

Causality, which means that the impulse response hit) is zero for t < 0. 

Finite support, which means that the impulse response of the filter is of some finite 
duration Tf, so that we may write hit) = 0 for t > Tp 

Under these two assumptions, we may express the filter output y(t) produced in response 
to the input x(t) as 

r T f 

y(t) = h( f)x(t - z)Az 
J 0 


Let the input x(t), impulse response h(t), and output y(t) be uniformly sampled at the rate 
(1/Ar) samples per second, so that we may put 


and 


t = nAz 


T = kAz 


where k and n are integers and Aris the sampling period. Assuming that Aris small 
enough for the product h(z)x(t - T) to remain essentially constant for kAz<Z< (k + I )Az 
for all values of k and z, we may approximate (2.52) by the convolution sum 


N- 1 

y(nAz) = h(kAz)x(nAz- kAz)Az 
k = 0 

where N At = Tp To simplify the notations used in this summation formula, we introduce 
three definitions: 

w k = h(kAz)Az 
x(nAz) = x n 
y(nAz) = y n 
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Sampled input 



Sampled output 
y(nAf) 


Tapped-delay-line (TDL) filter; also referred to as FIR filter. 


We may then rewrite the formula for y(iiAr) in the compact form 

N- 1 

y n = X W k X n-ic n = 0,±1,±2, ... 

k = 0 

Equation (2.53) may be realized using the structure shown in Figure 2.14, which consists 
of a set of delay elements (each producing a delay of Ar seconds), a set of multipliers 
connected to the delay-line taps, a corresponding set of weights supplied to the 
multipliers, and a summer for adding the multiplier outputs. The sequences x n and y n , for 
integer values of n as described in (2.53), are referred to as the input and output sequences, 
respectively. 

In the digital signal-processing literature, the structure of Figure 2.14 is known as a 
finite-duration impulse response (FIR) filter. This filter offers some highly desirable 
practical features; 

The filter is inherently stable, in the sense that a bounded input sequence produces a 
bounded output sequence. 

N- 1 

Depending on how the weights ^ are designated, the filter can perform the 

function of a low-pass filter or band-pass filter. Moreover, the phase response of the 
filter can be configured to be a linear function of frequency, which means that there 
will be no delay distortion. 

In a digital realization of the filter, the filter assumes a programmable form whereby 
the application of the filter can be changed merely by making appropriate changes to 
the weights, leaving the structure of the filter completely unchanged; this kind of 
flexibility is not available with analog filters. 

We will have more to say on the FIR filter in subsequent chapters of the book. 
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Hilbert Transform 


The Fourier transform is particularly useful for evaluating the frequency content of an 
energy signal or, in a limiting sense, that of a power signal. As such, it provides the 
mathematical basis for analyzing and designing frequency -selective filters for the 
separation of signals on the basis of their frequency content. Another method of separating 
signals is based on phase selectivity, which uses phase shifts between the pertinent signals 
to achieve the desired separation. A phase shift of special interest in this context is that of 
±90°. In particular, when the phase angles of all components of a given signal are shifted 
by ±90°, the resulting function of time is known as the Hilbert transform of the signal. The 
Hilbert transform is called a quadrature filter, it is so called to emphasize its distinct 
property of providing a ±90° phase shift. 

To be specific, consider a Fourier transformable signal g(t) with its Fourier transform 
denoted by Gif). The Hilbert transform of git), which we denote by gi t) , is defined by 



Hilbert-transform pairs* 


m(t)cos(2nft ) 

m(f)sin(27l/ c 0 

m(t)sin(27l/ c t) 

-m(t)cos(27t/ c f 

COS(27l/ c 0 

sin(2rc/ c f) 

sin(2n/ c f) 

-cos(27t/ c 0 

sin t 

1 - cost 

t 

t 


rect(t) 

--In 

t- 1/2 

K 

t + 1/2 

Sit) 

J_ 

Kt 


1 

t 


, 2 

, 2 


1 + t 

1 + t 


1 

t 

-nSit) 


Notes: S(t) denotes Dirac delta function; rect(t) denotes rectangular function; In denotes natural logarithm. 
* In the first two pairs, it is assumed that m(t) is band limited to the interval W </< W, where W </ c . 
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Clearly, Hilbert transformation is a linear operation. The inverse Hilbert transform, by 
means of which the original signal git) is linearly recovered from g(t ) , is defined by 





The functions git) and git) are said to constitute a Hilbert-transform pair. A short table of 
Hilbert-transform pairs is given in Table 2.3 on page 42. 

The definition of the Hilbert transform g(t) given in (2.54) may be interpreted as the 
convolution of g(t) with the time function Hint). We know from the convolution theorem 
listed in Table 2. 1 that the convolution of two functions in the time domain is transformed 
into the multiplication of their Fourier transforms in the frequency domain. 

For the time function l/(jtt), we have the Fourier-transform pair (see Property 14 in 
Table 2.2) 


nt 


-jsgn (/) 


where sgn if) is the signum function, defined in the frequency domain as 


sgn (/) 


1 , /> 0 

0 , /= 0 

- 1 , /< 0 


It follows, therefore, that the Fourier transform Gif) of git) is given by 

Gif) = -jsgn if) Gif) 

Equation (2.57) states that given a Fourier transformable signal git), we may obtain the 
Fourier transform of its Hilbert transform git) by passing git) through a linear time- 
invariant system whose frequency response is equal to -jsgn if). This system may be 
considered as one that produces a phase shift of -90° for all positive frequencies of the input 
signal and +90° degrees for all negative frequencies, as in Figure 2.15. The amplitudes of all 
frequency components in the signal, however, are unaffected by transmission through the 
device. Such an ideal system is referred to as a Hilbert transformer, or quadrature filter. 


The Hilbert transform differs from the Fourier transform in that it operates exclusively in 
the time domain. It has a number of useful properties of its own, some of which are listed 
next. The signal git) is assumed to be real valued, which is the usual domain of application 
of the Hilbert transform. For this class of signals, the Hilbert transform has the following 
properties. 

A signal g(t) and its Hilbert transform git) have the same magnitude spectrum. 

That is to say, 

\Gif)\ = \Gif)\ 
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(a) Magnitude response and 

(b) phase response of Hilbert 
transform. 



(b) 


arg[H(/)] 


- 90 ° 


+ 90 ° 


If g(t) is the Hilbert transform of g(t), then the Hilbert transform of g(t) is -g(t). 
Another way of stating this property is to write 

arg[G(/)] = -arg {G(/)> 

A signal g(t) and its Hilbert transform g(t) are orthogonal over the entire time interval 

( — °°, OO ) . 

In mathematical terms, the orthogonality of gif) and git) is described by 

J = 0 

— oo 

Proofs of these properties follow from (2.54), (2.55), and (2.57). 

Hilbert Transform of Low-Pass Signal 

Consider Figure 2.16a that depicts the Fourier transform of a low -pass signal git), whose 
frequency content extends from -W to W. Applying the Hilbert transform to this signal 
yields a new signal g(t) whose Fourier transform, Gif) , is depicted in Figure 2.16b. This 
figure illustrates that the frequency content of a Fourier transformable signal can be 
radically changed as a result of Hilbert transformation. 
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Illustrating application of the Hilbert transform to a low-pass signal: 
(a) Spectrum of the signal g(t)\ (b) Spectrum of the Hilbert transform git). 


Pre-envelopes 


The Hilbert transform of a signal is defined for both positive and negative frequencies. In 
light of the spectrum shaping illustrated in Example 5, a question that begs itself is: 


The answer to this fundamental question lies in the idea of a complex- valued signal called 
the pre-envelope of git), formally defined as 

g+(t) = g(0+jg(0 

where g(t) is the Hilbert transform of git). According to this definition, the given signal 
git) is the real part of the pre-envelope g + (t), and the Hilbert transform git) is the 
imaginary part of the pre-envelope. An important feature of the pre-envelope g + (t) is the 
behavior of its Fourier transform. Let G + (f) denote the Fourier transform of g + (t). Then, 
using (2.57) and (2.58) we may write 

G + if) = Gif) + sgn if) Gif) 

Next, invoking the definition of the signum function given in (2.56), we may rewrite (2.59) 
in the equivalent form 


G + if) = 


2 Gif), f> 0 
Gi 0), /= 0 

0 , /< 0 
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where G(0) is the value of Gif) at the origin /= 0. Equation (2.60) clearly shows that the 
pre-envelope of the signal g(t) has no frequency content (i.e., its Fourier transform 
vanishes) for all negative frequencies, and the question that was posed earlier has indeed 
been answered. Note, however, in order to do this, we had to introduce the complex-valued 
version of a real-valued signal as described in (2.58). 

From the foregoing analysis it is apparent that for a given signal g(t) we may determine 
its pre-envelope g + (t) in one of two equivalent procedures. 

Time-domain procedure. Given the signal git), we use (2.58) to compute the pre- 
envelope gft). 

Frequency-domain procedure. We first determine the Fourier transform G(f) of the 
signal git), then use (2.60) to determine G + (f), and finally evaluate the inverse 
Fourier transform of G + (f) to obtain 

g + (0 = 2 f°° G(f) exp (j 2nft) d f 

J o 

Depending on the description of the signal, procedure 1 may be easier than procedure 2, or 
vice versa. 

Equation (2.58) defines the pre-envelope g + (t) for positive frequencies. Symmetrically, 
we may define the pre-envelope for negative frequencies as 

g _(0 = 

The two pre-envelopes g + (t) and gjf) are simply the complex conjugate of each other, as 
shown by 


gJO = gl( t) 

where the asterisk denotes complex conjugation. The spectrum of the pre-envelope g + (t ) is 
nonzero only for positive frequencies; hence the use of a plus sign as the subscript. On the 
other hand, the use of a minus sign as the subscript is intended to indicate that the 
spectrum of the other pre-envelope gjf) is nonzero only for negative frequencies, as 
shown by the Fourier transform 


G_(f) = 


0 , 

G(0), 


l 2 Gif), 


/> o 
/= 0 
/< 0 


Thus, the pre-envelope gjt) and gjf) constitute a complementary pair of complex-valued 
signals. Note also that the sum of gjt ) and gjt) is exactly twice the original signal git). 

Given a real-valued signal, (2.60) teaches us that the pre-envelope gjt) is uniquely 
defined by the spectral content of the signal for positive frequencies. By the same token, 
(2.64) teaches us that the other pre-envelope gjt) is uniquely defined by the spectral 
content of the signal for negative frequencies. Since gjt) is simply the complex conjugate 
of gjt) as indicated in (2.63), we may now make the following statement: 


Complex Envelopes of Band-Pass Signals 
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In other words, given the spectral content of such a signal for positive frequencies, we may 
uniquely define the spectral content of the signal for negative frequencies. Here then is the 
mathematical justification for basing the bandwidth of a Fourier transformable signal on 
its spectral content exclusively for positive frequencies, which is exactly what we did in 
Section 2.4, dealing with bandwidth. 


Pre-envelopes of Low-Pass Signal 

Continuing with the low-pass signal g(t) considered in Example 5, Figure 2.17a and b depict 
the corresponding spectra of the pre-envelope g + (t) and the second pre-envelope g_(t), both 
of which belong to g(t). Whereas the spectrum of g(f) is defined for -W </< W as in Figure 
2.16a, we clearly see from Figure 2.17 that the spectral content of g + (t) is confined entirely 
to 0 </< W, and the spectral content of gjt) is confined entirely to -W </< 0. 




Another illustrative application of the Hilbert transform to a low-pass signal: 
(a) Spectrum of the pre-envelope g+(t)\ (b) Spectrum of the other pre-envelope g_(0- 


An astute reader may see an analogy between the use of phasors and that of pre-envelopes. 
In particular, just as the use of phasors simplifies the manipulations of alternating currents 
and voltages in the study of circuit theory, so we find the pre-envelope simplifies the 
analysis of band-pass signals and band-pass systems in signal theory. 

More specifically, by applying the concept of pre-envelope to a band-pass signal, the 
signal is transformed into an equivalent low-pass representation. In a corresponding way, a 
band-pass filter is transformed into its own equivalent low-pass representation. Both 
transformations, rooted in the Hilbert transform, play a key role in the formulation of 
modulated signals and their demodulation, as demonstrated in what follows in this and 
subsequent chapters. 

Complex Envelopes of Band-Pass Signals 


The idea of pre-envelopes introduced in Section 2.9 applies to any real-valued signal, be it 
of a low-pass or band-pass kind; the only requirement is that the signal be Fourier 
transformable. From this point on and for the rest of the chapter, we will restrict attention 
to band-pass signals. Such signals are exemplified by signals modulated onto a sinusoidal 
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carrier. In a corresponding way, when it comes to systems we restrict attention to band- 
pass systems. The primary reason for these restrictions is that the material so presented is 
directly applicable to analog modulation theory, to be covered in Section 2.14, as well as 
other digital modulation schemes covered in subsequent chapters of the book. With this 
objective in mind and the desire to make a consistent use of notation with respect to 
material to be presented in subsequent chapters, henceforth we will use s(t) to denote a 
modulated signal. When such a signal is applied to the input of a band-pass system, such 
as a communication channel, we will use x(t) to denote the resulting system (e.g., channel) 
output. However, as before, we will use h(t) as the impulse response of the system. 

To proceed then, let the band-pass signal of interest be denoted by s(t ) and its Fourier 
transform be denoted by S(f). We assume that the Fourier transform S(f) is essentially 
confined to a band of frequencies of total extent 2W, centered about some frequency ±/ c , as 
illustrated in Figure 2.18a. We refer to f c as the carrier frequency, this terminology is 




(b) 



(a) Magnitude spectrum of band-pass signal s(t)\ (b) Magnitude spectrum of 
pre-envelope s + (t); (c) Magnitude spectrum of complex envelope s(t ) . 


Canonical Representation of Band-Pass Signals 
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borrowed from modulation theory. In the majority of communication signals encountered 
in practice, we find that the bandwidth 2 W is small compared with/ c , so we may refer to 
the signal s(t) as a narrowband signal. However, a precise statement about how small the 
bandwidth must be for the signal to be considered narrowband is not necessary for our 
present discussion. Hereafter, the terms band-pass and narrowband are used 
interchangeably. 

Let the pre-envelope of the narrowband signal s(t ) be expressed in the form 
«+(0 = J(f)exp(j2jt/ C f) 

We refer to s(t) as the complex envelope of the band-pass signal s(t). Equation (2.65) 
may be viewed as the basis of a definition for the complex envelope s(t) in terms of the 
pre-envelope s + (t). In light of the narrowband assumption imposed on the spectrum of 
the band-pass signal s(t), we find that the spectrum of the pre-envelope s + (t) is limited 
to the positive frequency band f c - W < f < f c + W, as illustrated in Figure 2.18b. 
Therefore, applying the frequency-shifting property of the Fourier transform to (2.65), 
we find that the spectrum of the complex envelope s(t) is correspondingly limited to 
the band -W </< W and centered at the origin /= 0, as illustrated in Figure 2.18c. In 
other words, the complex envelope s(t) of the band-pass signal s(t ) is a complex low- 
pass signal. The essence of the mapping from the band-pass signal s(t) to the complex 
low-pass signal s(t) is summarized in the following threefold statement: 

• The information content of a modulated signal s(t ) is fully preserved in the complex 
envelope s(t) . 

• Analysis of the band-pass signal s(t) is complicated by the presence of the carrier 
frequency / c ; in contrast, the complex envelope s(t) dispenses with f c , making its 
analysis simpler to deal with. 

• The use of s(t) requires having to handle complex notations. 

Canonical Representation of Band-Pass Signals 


By definition, the real part of the pre-envelope s + (t ) is equal to the original band-pass 
signal s(t). We may therefore express the band-pass signal s(t) in terms of its 
corresponding complex envelope s( t) as 

s(t) = Re[J(f)exp(j27t/ c r)] 

where the operator Re[.] denotes the real part of the quantity enclosed inside the square 
brackets. Since, in general, s(t) is a complex-valued quantity, we emphasize this property 
by expressing it in the Cartesian form 

s(t) = jj(0 +js Q (f) 

where ■yj(f) and sq(t) are both real-valued low-pass functions; their low-pass property is 
inherited from the complex envelope s(t) . We may therefore use (2.67) in (2.66) to 
express the original band-pass signal s(t) in the canonical or standard form 

s(t) = Sj(0cos(27t/ c 0-i Q (0sin(27t/ c 0 

We refer to Sj(f) as the in-phase component of the band-pass signal s(t) and refer to ,vq(?) as 
the quadrature-phase component or simply the quadrature component of the signal s(t). 
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This nomenclature follows from the following observation: if cos(27t/ c f), the multiplying 
factor of SjO), is viewed as the reference sinusoidal carrier, then sin(27t/ c f), the multiplying 
factor of sq(t), is in phase quadrature with respect to cos(2jt/ c f). 

According to (2.66), the complex envelope s(t) may be pictured as a time-varying 
phasor positioned at the origin of the (sq, ,VQ)-plane, as indicated in Figure 2.19a. With 
time t varying continuously, the end of the phasor moves about in the plane. Figure 2.19b 
depicts the phasor representation of the complex exponential exp(27t/ c f). In the definition 
given in (2.66), the complex envelope s(t) is multiplied by the complex exponential 
exp(j27t/ c f). The angles of these two phasors, therefore, add and their lengths multiply, as 
shown in Figure 2.19c. Moreover, in this latter figure, we show the (sj, .Vq (- phase rotating 
with an angular velocity equal to 2nf c radians per second. Thus, in the picture portrayed in 
the figure, the phasor representing the complex envelope s(t ) moves in the (sj, Sg)-plane, 
while at the very same time the plane itself rotates about the origin. The original band-pass 
signal s(t) is the projection of this time-varying phasor on a fixed line representing the real 
axis, as indicated in Figure 2.19c. 

Since both .q(r) and .Vq(/) are low-pass signals limited to the band -W < f< W, they may 
be extracted from the band-pass signal s(t) using the scheme shown in Figure 2.20a. Both 
low-pass filters in this figure are designed identically, each with a bandwidth equal to W. 
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Illustrating an interpretation of the complex envelope s(t) and its multiplication by 


exp(j27i/ c f). 
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(a) (b) 

(a) Scheme for deriving the in-phase and quadrature components of a band-pass 
signal g(f). (b) Scheme for reconstructing the band-pass signal from its in-phase and quadrature 
components. 

To reconstruct s(f) from its in-phase and quadrature components, we may use the scheme 
shown in Figure 2.20b. In light of these statements, we may refer to the scheme in Figure 
2.20a as an analyzer, in the sense that it extracts the in-phase and quadrature components, 
vj(t) and .vq( r), from the band-pass signal s(t). By the same token, we may refer to the 
second scheme in Figure 2.20b as a synthesizer, in the sense it reconstructs the band-pass 
signal s(i) from its in-phase and quadrature components, sj(/) and .Y () (fj. 

The two schemes shown in Figure 2.20 are basic to the study of linear modulation 
schemes, be they of an analog or digital kind. Multiplication of the low-pass in-phase 
component jj(r) by cos(27t/ c f) and multiplication of the quadrature component sq(t) by 
sin(27t/ c f) represent linear forms of modulation. Provided that the carrier frequency f c is 
larger than the low-pass bandwidth W, the resulting band-pass function s(t) defined in 
(2.68) is referred to as a passband signal waveform. Correspondingly, the mapping from 
Si(t) and ,s’q(?) combined into s(t ) is known as passband modulation. 


Equation (2.67) is the Cartesian form of defining the complex envelope s(t) of the band- 
pass signal s(t). Alternatively, we may define s(t) in the polar form as 

s(t) = a(t) exp [j(Zi(f)] 

where a(t) and (p(t) are both real-valued low-pass functions. Based on the polar 
representation of (2.69), the original band-pass signal s(f) is itself defined by 

s(t) = a(t)cos[2nf c t + <ft(t)] 

We refer to a(t ) as the natural envelope or simply the envelope of the band-pass signal sit) 
and refer to </>(f) as the phase of the signal. We now see why the term “pre-envelope” was 
used in referring to (2.58), the formulation of which preceded that of (2.70). 
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The envelope ait) and phase (pit) of a band-pass signal sit) are respectively related to the 
in-phase and quadrature components 5j(f) and Sq)1) as follows (see the time-varying 
phasor representation of Figure 2.19a): 


and 


a(t) 


Jsj(t) + s\ 


Q (0 


(j. \t) = tan- 1 


VO A 

VO ) 


Conversely, we may write 


and 


■Jj(r) = a(r)cos[^(f)] 


s Q (t) = a(t) sin[^(f)] 

Thus, both the in-phase and quadrature components of a band-pass signal contain 
amplitude and phase information, both of which are uniquely defined for a prescribed 
phase (p{t), modulo 2n. 


Complex Low-Pass Representations of Band-Pass Systems 


Now that we know how to handle the complex low-pass representation of band-pass 
signals, it is logical that we develop a corresponding procedure for handling the 
representation of linear time-invariant band-pass systems. Specifically, we wish to show 
that the analysis of band-pass systems is greatly simplified by establishing an analogy, 
more precisely an isomorphism, between band-pass and low-pass systems. For example, 
this analogy would help us to facilitate the computer simulation of a wireless 
communication channel driven by a sinusoidally modulated signal, which otherwise could 
be a difficult proposition. 

Consider a narrowband signal s(t), with its Fourier transform denoted by S(f) . We 
assume that the spectrum of the signal s(t) is limited to frequencies within ±W hertz of the 
carrier frequency f c . We also assume that W </ c . Let the signal sit) be applied to a linear 
time-invariant band-pass system with impulse response h(t) and frequency response //(/). 
We assume that the frequency response of the system is limited to frequencies within ±B 
of the carrier frequency / c . The system bandwidth 2 B is usually narrower than or equal to 
the input signal bandwidth 2 W. We wish to represent the band-pass impulse response h{t) 
in terms of two quadrature components, denoted by h\(i) and /?q(?). In particular, by 
analogy to the representation of band-pass signals, we express hit) in the form 

h{t) = /ij(f)cos(27t/ c f) - /iQ(t)sin(27t/ c t) 

Correspondingly, we define the complex impulse response of the band-pass system as 

h{t) = /ij(t) +)h Q (t) 
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Hence, following (2.66), we may express h(t) in terms of h(t) as 

h(t) = Re[A(0exp(j2n/ c 0] 

Note that /q(f), hq(t), and h(t) are all low-pass functions, limited to the frequency band 
-B </< B. 

We may determine the complex impulse response h{t) in terms of the in-phase and 
quadrature components /jj(f) and /?q(f) of the band-pass impulse response hit) by building 
on (2.76). Alternatively, we may determine it from the band-pass frequency response H(f) 
in the following way. We first use (2.77) to write 

2 h(t) = M0exp(j2n/ c 0 + A*(0exp(-j2n/ c 0 

where h*(t) is the complex conjugate of /?(f) ; the rationale for introducing the factor of 2 
on the left-hand side of (2.78) follows from the fact that if we add a complex signal and its 
complex conjugate, the sum adds up to twice the real part and the imaginary parts cancel. 
Applying the Fourier transform to both sides of (2.78) and using the complex-conjugation 
property of the Fourier transform, we get 

2 H(f) = H(f-f c ) + H*(-f-f c ) 

where H(j) ^ h(t) and //(/) ^ h(t) . Equation (2.79) satisfies the requirement that 
H*(f) = H(—f) for a real-valued impulse response hit). Since //(/) represents a low-pass 
frequency response limited to |/| < B with B </ c , we infer from (2.79) that 

Hif-f c ) = 2 H{f), f> 0 

Equation (2.80) states: 


Having determined the complex frequency response H(Jj , we decompose it into its in- 
phase and quadrature components, as shown by 

Hit) = Hiif) 

where the in-phase component is defined by 

Hi(f) = 

and the quadrature component is defined by 

Hq(f) = 

Finally, to determine the complex impulse response h(t) of the band-pass system, we take 
the inverse Fourier transform of H(f), obtaining 

~ oo 

h(t) = J H(f)exp(]2nft) df 

— oo 

which is the formula we have been seeking. 
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Putting the Complex Representations of Band-Pass Signals 
and Systems All Together 


Examining (2.66) and (2.77), we immediately see that these two equations share a 
common multiplying factor: the exponential exp(j27t/ c f). In practical terms, the inclusion 
of this factor accounts for a sinusoidal carrier of frequency / c , which facilitates 
transmission of the modulated (band-pass) signal s(t) across a band-pass channel of 
midband frequency f c . In analytic terms, however, the presence of this exponential factor 
in both (2.66) and (2.77) complicates the analysis of the band-pass system driven by the 
modulated signal s(t). This analysis can be simplified through the combined use of 
complex low-pass equivalent representations of both the modulated signal s(t ) and the 
band-pass system characterized by the impulse response hit). The simplification can be 
earned out in the time domain or frequency domain, as discussed next. 


Equipped with the complex representations of band-pass signals and systems, we are 
ready to derive an analytically efficient method for determining the output of a band-pass 
system driven by a corresponding band-pass signal. To proceed with the derivation, 
assume that S(f), denoting the spectrum of the input signal .sit). and H(f), denoting the 
frequency response of the system, are both centered around the same frequency / c . In 
practice, there is no need to consider a situation in which the carrier frequency of the input 
signal is not aligned with the midband frequency of the band-pass system, since we have 
considerable freedom in choosing the carrier or midband frequency. Thus, changing the 
carrier frequency of the input signal by an amount A/ c , for example, simply corresponds to 
absorbing (or removing) the factor exp(±j27tA/ c f) in the complex envelope of the input 
signal or the complex impulse response of the band-pass system. We are therefore justified 
in proceeding on the assumption that Sif) and Hif) are both centered around the same 
carrier frequency / c . 

Let x(t) denote the output signal of the band-pass system produced in response to the 
incoming band-pass signal sit). Clearly, xit) is also a band-pass signal, so we may 
represent it in terms of its own low-pass complex envelope x(t) as 

x(t) = Re[.r(f) exp(j2jt/ c f)] 

The output signal x(t) is related to the input signal s(t) and impulse response h(t) of the 
system in the usual way by the convolution integral 

oo 

x(t) = | h{ ?)s(t- t) dr 

— oo 

In terms of pre-envelopes, we have hit) = Re[/; + (f) | and sit) = Re[,v + (f)|. We may therefore 
rewrite (2.86) in terms of the pre-envelopes s + (t) and h + (t) as 

oo 

x(t) = J Re[h + (r)]Re[s + (t- r)]dr 
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To proceed further, we make use of a basic property of pre-envelopes that is described by 
the following relation: 

J Re[/ !+ (r)]Re[* + (r)]dr = kef] h + (f)s* + (f) dx 

where we have used ras the integration variable to be consistent with that in (2.87); details 
of (2.88) are presented in Problem 2.20. Next, from Fourier-transform theory we note that 
using s(-t) in place of ,v( x) has the effect of removing the complex conjugation on the 
right-hand side of (2.88). Hence, bearing in mind the algebraic difference between the 
argument of s + (x) in (2.88) and that of s + (t - x) in (2.87), and using the relationship 
between the pre-envelope and complex envelope of a band-pass signal, we may express 
(2.87) in the equivalent form 

-t r- OO - 

x(t) = -Re J h + (x)s + (t-x) Ax 

^ L* — oo 

1 I o° 

= 9 Re H h(t)expQ2nf c f)s(t- T)exp[j2jt/ C (r- r)] dr 



exp(j2jt/ c 0j 

— oo 


h( x)s(t— x) dx 


Thus, comparing the right-hand sides of (2.85) and (2.89), we readily find that for a large 
enough carrier frequency / c , the complex envelope x(t) of the output signal is simply 
defined in terms of the complex envelope s(t) of the input signal and the complex impulse 
response h(t) of the band-pass system as follows: 

x(t ) = -J h(t)s{t-x) dr 

This important relationship is the result of the isomorphism between a band-pass function 
and the corresponding complex low-pass function, in light of which we may now make the 
following summarizing statement: 


In computational terms, the significance of this statement is profound. Specifically, in 
dealing with band-pass signals and systems, we need only concern ourselves with the 
functions s(t), x(t ) , and h(t) , representing the complex low-pass equivalents of the 
excitation applied to the input of the system, the response produced at the output of the 
system, and the impulse response of the system respectively, as illustrated in Figure 2.21. 
The essence of the filtering process performed in the original system of Figure 2.21a is 
completely retained in the complex low-pass equivalent representation depicted in Figure 
2.21b. 

The complex envelope s( t) of the input band-pass signal and the complex impulse 
response h(t) of the band-pass system are defined in terms of their respective in-phase 
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(a) Input-output description of a band-pass 
system; (b) Complex low-pass equivalent 
model of the band-pass system. 


(a) 


(b) 



and quadrature components by (2.67) and (2.76), respectively. Substituting these relations 
into (2.90), we get 

2 x(t) = h^+sit) 

= [/i,(0 +;7! Q (0 ]*[s i W +js Q (0J 

where the symbol ★ denotes convolution. Because convolution is distributive , we may 
rewrite (2.91) in the equivalent form 

2 x(t) = [h l (t)'ks l (t)-h Q (t)*s Q (t)]+i[h Q (t)+s l (t) + h l +s Q (t)] 

Let the complex envelope x{t) of the response be defined in terms of its in-phase and 
quadrature components as 

X(t) = -Vj(f)+j * Q (0 

Then, comparing the real and imaginary parts in (2.92) and (2.93), we find that the in- 
phase component jq (t) is defined by the relation 

2xj(f) = h 1 (t)'ks 1 (t)-h Q (t)'ksQ(t) 
and its quadrature component xq(/) is defined by the relation 
2 Xq(t) = /7q( r) ★ ,Vj( f) + /ij(r) ★^-Q(r) 

Thus, for the purpose of evaluating the in-phase and quadrature components of the 
complex envelope x(t) of the system output, we may use the low-pass equivalent model 
shown in Figure 2.22. All the signals and impulse responses shown in this model are real- 
valued low-pass functions; hence a time-domain procedure for simplifying the analysis of 
band-pass systems driven by band-pass signals. 


Alternatively, Fourier-transforming the convolution integral of (2.90) and recognizing that 
convolution in the time domain is changed into multiplication in the frequency domain, we 
get 


m = \H(j)'s(f) 
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Block diagram illustrating the relationship between the 
in-phase and quadrature components of the response of a band-pass 
filter and those of the input signal. 


where s(t) ^ £(/), /?(?) ^ //(/), and x(t) ^ X(f) . The H(j) is itself related to the 
frequency response H(j) of the band-pass system by (2.80). Thus, assuming that H(f) is 
known, we may use the frequency-domain procedure summarized in Table 2.4 for 
computing the system output x(t) in response to the system input s(t). 

In actual fact, the procedure of Table 2.4 is the frequency-domain representation of the 
low -pass equivalent to the band-pass system, depicted in Figure 2.21b. In computational 
terms, this procedure is of profound practical significance. We say so because its use 
alleviates the analytic and computational difficulty encountered in having to include the 
carrier frequency / c in the pertinent calculations. 

As discussed earlier in the chapter, the theoretical formulation of the low-pass 
equivalent in Figure 2.21b is rooted in the Hilbert transformation, the evaluation of which 
poses a practical problem of its own, because of the wideband 90°-phase shifter involved 
in its theory. Fortunately, however, we do not need to invoke the Hilbert transform in 
constructing the low-pass equivalent. This is indeed so, when a message signal modulated 
onto a sinusoidal carrier is processed by a band-pass filter, as explained here: 

Typically, the message signal is band limited for all practical purposes. Moreover, 
the carrier frequency is larger than the highest frequency component of the signal; 
the modulated signal is therefore a band-pass signal with a well-defined passband. 
Hence, the in-phase and quadrature components of the modulated signal s(t), 
represented respectively by .s-j(f) and sq(t), are readily obtained from the canonical 
representation of s(t), described in (2.68). 

Given the well-defined frequency response H(f) of the band-pass system, we may 
readily evaluate the corresponding complex low-pass frequency response H(f ) ; see 
(2.80). Hence, we may compute the system output x(t) produced in response to the 
carrier-modulated input s(t) without invoking the Hilbert transform. 
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Procedure for the computational analysis of a band-pass system 
driven by a band-pass signal 

Given the frequency response H(f) of a band-pass system, computation of the output 
signal x (t) of the system in response to an input band-pass signal s(t) is summarized as 
follows: 

Use (2.80), namely H(f—f c ) = 2 H(f) , for /> 0 to determine 11(f) . 

Expressing the input band-pass signal s(t ) in the canonical form of (2.68), evaluate 
the complex envelope s(t) = sft) + j.Vy( t) where .V|(f) is the in-phase component 
of s(t) and vq( r) is its quadrature component. Hence, compute the Fourier 
transform S(f) = F[s(f)] 

Using (2.96), compute X(f) = ^H(f)S(f) , which defines the Fourier transform of 
the complex envelope x(t) of the output signal x(t). 

Compute the inverse Fourier transform of X(f) , yielding x(t) = F 1 \ X(f)\ 

Use (2.85) to compute the desired output signal x(t) = Re [x( f ) exp (j 2 Jt/ C f) J 


To summarize, the frequency-domain procedure described in Table 2.4 is well suited for 
the efficient simulation of communication systems on a computer for two reasons: 

The low-pass equivalents of the incoming band-pass signal and the band-pass system 
work by eliminating the exponential factor exp( j 2tt/ c f) from the computation without 
loss of information. 

The fast Fourier transform (FFT) algorithm , discussed later in the chapter, is used 
for numerical computation of the Fourier transform. This algorithm is used twice in 
Table 2.4, once in step 2 to perform Fourier transformation, and then again in step 4 
to perform inverse Fourier transformation. 

The procedure of this table, rooted largely in the frequency domain, assumes availability 
of the band-pass system’s frequency response H(f). If, however, it is the system’s impulse 
response h{t) that is known, then all we need is an additional step to Fourier transform h(t) 
into FKJ) before initiating the procedure of Table 2.4. 

Linear Modulation Theory 


The material presented in Sections 2.8-2.13 on the complex low-pass representation of 
band-pass signals and systems is of profound importance in the study of communication 
theory. In particular, we may use the canonical formula of (2.68) as the mathematical basis 
for a unified treatment of linear modulation theory, which is the subject matter of this 
section. 


Linear Modulation Theory 
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We start this treatment with a formal definition: 


The message signal (e.g., voice, video, data sequence) is referred to as the modulating 
signal , and the result of the modulation process is referred to as the modulated signal. 
Naturally, in a communication system, modulation is performed in the transmitter. The 
reverse of modulation, aimed at recovery of the original message signal in the receiver, is 
called demodulation. 

Consider the block diagram of Figure 2.23, depicting a modulator, where m(t ) is the 
message signal, cos(2jt/ c f) is the carrier, and s(t) is the modulated signal. To apply (2.68) 
to this modulator, the in-phase component jj(f) in that equation is treated simply as a 
scaled version of the message signal denoted by m(t). As for the quadrature component 
.Vq(?), it is defined by a spectrally shaped version of m(t ) that is performed linearly. In such 
a scenario, it follows that a modulated signal s(t) defined by (2.68) is a linear function of 
the message signal m(t ); hence the reference to this equation as the mathematical basis of 
linear modulation theory. 


Message signal 
m(r) 


Modulator 


Carrier 

cos(2ji/ c r) 


Modulated signal 

s(r) 


Block diagram of a modulator. 


To recover the original message signal m{t) from the modulated signal s(t), we may use 
a demodulator, the block diagram of which is depicted in Figure 2.24. An elegant feature 
of linear modulation theory is that demodulation of s(t) is also achieved using linear 
operations. However, for linear demodulation of s(t) to be feasible, the locally generated 
carrier in the demodulator of Figure 2.24 has to be synchronous with the original 
sinusoidal carrier used in the modulator of Figure 2.23. Accordingly, we speak of 
synchronous demodulation or coherent detection. 


Modulated signal 
s(t) 


Demodulator 


Locally 

generated 

carrier 

COS(27t/ c f) 


Demodulated signal 
m{t) 


Block diagram of a demodulator. 


60 


Fourier Analysis of Signals and Systems 


Depending on the spectral composition of the modulated signal, we have three kinds of 
linear modulation in analog communications: 

• double sideband-suppressed carrier (DSB-SC) modulation; 

• vestigial sideband (VSB) modulation; 

• single sideband (SSB) modulation. 

These three methods of modulation are discussed in what follows and in this order. 


DSB-SC modulation is the simplest form of linear modulation, which is obtained by 
setting 

jj(r) = m(t ) 

and 

s Q (t) = 0 

Accordingly, (2.68) is reduced to 

s(t) = m(t) cos (2nf c t) 

the implementation of which simply requires a product modulator that multiplies the 
message signal m(t) by the carrier cos(2 nf c t ) , assumed to be of unit amplitude. 

For a frequency-domain description of the DSB-SC-modulated signal defined in (2.97), 
suppose that the message signal m(t ) occupies the frequency band -W </< W, as depicted 
in Figure 2.25a; hereafter, W is referred to as the message bandwidth. Then, provided that 
the carrier frequency satisfies the condition / c > W. we find that the spectrum of the DSB- 
SC-modulated signal consists of an upper sideband and lower sideband , as depicted in 
Figure 2.25b. Comparing the two parts of this figure, we immediately see that the channel 
bandwidth , B, required to support the transmission of the DSB-SC-modulated signal from 
the transmitter to the receiver is twice the message bandwidth. 



(b) 


S(f ) 

^M(O) 

Upper 

/k 

Lower — 

sideband \ 


-fc-W -fc -/c+W 


/c-W fc fc+W 


(a) Message spectrum, (b) Spectrum of DSB-SC 
modulated wave s(f), assuming/ c > W. 
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One other interesting point apparent from Figure 2.25b is that the spectrum of the DSB-SC 
modulated signal is entirely void of delta functions. This statement is further testimony to the 
fact that the carrier is suppressed from the generation of the modulated signal s(t) of (2.97). 

Summarizing the useful features of DSB-SC modulation: 

• suppression of the carrier, which results in saving of transmitted power; 

• desirable spectral characteristics, which make it applicable to the modulation of 
band-limited message signals; 

• ease of synchronizing the receiver to the transmitter for coherent detection. 

On the downside, DSB-SC modulation is wasteful of channel bandwidth. We say so for 
the following reason. The two sidebands, constituting the spectral composition of the 
modulated signal s(t), are actually the image of each other with respect to the carrier 
frequency / c ; hence, the transmission of either sideband is sufficient for transporting sit) 
across the channel. 


In VSB modulation, one sideband is partially suppressed and a vestige of the other 
sideband is configured in such a way to compensate for the partial sideband suppression 
by exploiting the fact that the two sidebands in DSB-SC modulation are the image of each 
other. A popular method of achieving this design objective is to use the frequency 
discrimination method. Specifically, a DSB-SC-modulated signal is first generated using a 
product modulator, followed by a band-pass filter, as shown in Figure 2.26. The desired 
spectral shaping is thereby realized through the appropriate design of the band-pass filter. 

Suppose that a vestige of the lower sideband is to be transmitted. Then, the frequency 
response of the band-pass filter, H(f), takes the form shown in Figure 2.27; to simplify 
matters, only the frequency response for positive frequencies is shown in the figure. 
Examination of this figure reveals two characteristics of the band-pass filter: 

Normalization of the frequency response, which means that 


H(f) 


1 for f c +/ v ^ I/I <f c + w 
\ for I/I =/ c 


where / v is the vestigial bandwidth and the other parameters are as previously 
defined. 

Odd symmetry of the cutoff portion inside the transition interval f c -/ v < |/| </ c +/ v , 
which means that values of the frequency response H(f) at any two frequencies 
equally spaced above and below the carrier frequency add up to unity. 


Frequency-discrimination method 
for producing VSB modulation 
where the intermediate signal .tj(t) 
is DSB-SC modulated. 


Message signal 
m(r) 



Carrier 

COS(27I f c t) 
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Magnitude response of VSB filter; only the 
positive-frequency portion is shown 

Consequently, we find that shifted versions of the frequency response H(f) satisfy the 
condition 

+ H(f+f c ) =1 for -W< |/| < W 

Outside the frequency band of interest defined by | / 1 > f c + W, the frequency response 
H(f) can assume arbitrary values. We may thus express the channel bandwidth required 
for the transmission of VSB-modulated signals as 

B = W+f v 

With this background, we now address the issue of how to specify //(/). We first use the 
canonical formula of (2.68) to express the VSB-modulated signal s/f), containing a 
vestige of the lower sideband, as 

*l(0 = |/M(0cos(2jt/ c 0-|m Q (0sin(2jt/ c 0 

where m(t ) is the message signal, as before, and mq(f) is the spectrally shaped version of 
m(t ); the reason for the factor 1/2 will become apparent later. Note that if niqff) is set equal 
to zero, (2.101) reduces to DSB-SC modulation. It is therefore in the quadrature signal 
otq(0 that VSB modulation distinguishes itself from DSB-SC modulation. In particular, 
the role of mq(/) is to interfere with the message signal m(t) in such a way that power in 
one of the sidebands of the VSB-modulated signal s{t) (e.g., the lower sideband in Figure 
2.27) is appropriately reduced. 

To determine mq(f), we examine two different procedures: 

Phase-discrimination, which is rooted in the time-domain description of (2.101); 
transforming this equation into the frequency domain, we obtain 

S.if) = 

where 

M(f) = F [m(t)l and M Q (/) = F[/n Q (f)] 

Frequency-discrimination, which is structured in the manner described in Figure 
2.26; passing the DSB-SC-modulated signal (i.e., the intermediate signal .s/r) in 
Figure 2.26) through the band-pass filter, we write 

S { (f) = \\M(f-f c ) + M(f+f c )]H(f) 
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In both (2.102) and (2.103), the spectrum S^if) is defined in the frequency interval 

f c -W<\f\<f c + W 

Equating the right-hand sides of these two equations, we get (after canceling common 
terms) 

\ \M(f-f c ) + M(f+f c )] - 1 [M q (/-/ c ) - M q (/+/ c )] 

= [M(f-f c )+M(f+f c )]H(f) 


Shifting both sides of (2.104) to the left by the amount/, we get (after canceling common 
terms) 


\M{f) - ^M q (/) = +/ c ), -W< |/| < W 

where the terms M(f+2f c ) and Mq(/+ 2/ c ) are ignored as they both lie outside the 
interval - W < \ f\ < W. Next, shifting both sides of (2.104) by the amount/., but this time 
to the right, we get (after canceling common terms) 

\M(f) + -W< |/| < W 

where, this time, the terms M(f-2f c ) and Mq(/- 2/ c ) are ignored as they both lie 
outside the interval - W < \ f \ < W. 

Given (2.105) and (2.106), all that remains to be done now is to follow two simple 
steps: 


Adding these two equations and then factoring out the common term M(f), we get 
the condition of (2.99) previously imposed on H(f) \ indeed, it is with this condition 
in mind that we introduced the scaling factor 1/2 in (2.101). 

Subtracting (2.105) from (2.106) and rearranging terms, we get the desired 
relationship between Mq(/) and M(f ) : 

M q (/) = j -W< |/| < W 


Let H Q (f) denote the frequency response of a quadrature filter that operates on the 
message spectrum M(J) to produce Mq(J). In light of (2.107), we may readily define 
//q(/) in terms of //( /’) as 


H q(f) = 


M Q (f) 

M(f) 

j[W-/ c )-W+/ c )L 


-w< |/| < w 


Equation (2.108) provides the frequency-domain basis for the phase-discrimination 
method for generating the VSB-modulated signal sft), where only a vestige of the lower 
sideband is retained. With this equation at hand, it is instructive to plot the frequency 
response Hq(J). For the frequency interval -W </< VK the term ///'-/) is defined by the 
response H{f) for negative frequencies shifted to the right by/, whereas the term H(f+ f c ) 
is defined by the response H(f) for positive frequencies shifted to the left by /. 
Accordingly, building on the positive frequency response plotted in Figure 2.27, we find 
that the corresponding plot of //q( /) is shaped as shown in Figure 2.28. 
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n Q (f) 



1.0 


/v 


/ 


- 1.0 


Frequency response of the quadrature filter for 
producing the quadrature component of the VSB wave. 


The discussion on VSB modulation has thus far focused on the case where a vestige of the 
lower sideband is transmitted. For the alternative case when a vestige of the upper sideband 
is transmitted, we find that the corresponding VSB-modulated wave is described by 


where the quadrature signal m^t) is constructed from the message signal m(t) in exactly 
the same way as before. 

Equations (2.101) and (2.109) are of the same mathematical form, except for an 
algebraic difference; they may, therefore, be combined into the single formula 


where the minus sign applies to a VSB-modulated signal containing a vestige of the lower 
sideband and the plus sign applies to the alternative case when the modulated signal 
contains a vestige of the upper sideband. 

The formula of (2.110) for VSB modulation includes DSB-SC modulation as a special 
case. Specifically, setting /«q(0 = 0, this formula reduces to that of (2.97) for DSB-SC 
modulation, except for the trivial scaling factor of 1/2. 


Next, considering SSB modulation, we may identify two choices: 

The carrier and the lower sideband are both suppressed, leaving the upper sideband 
for transmission in its full spectral content; this first SSB-modulated signal is 
denoted by *?usb(0- 

The carrier and the upper sideband are both suppressed, leaving the lower sideband 
for transmission in its full spectral content; this second SSB-modulated signal is 
denoted by .v IS | 5 (7). 

The Fourier transforms of these two modulated signals are the image of each other with 
respect to the carrier frequency / c , which, as mentioned previously, emphasizes that the 
transmission of either sideband is actually sufficient for transporting the message signal 
«7(r) over the communication channel. In practical terms, both ,v US | 5 (/j and slsb( ? ) require 
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Frequency response of the quadrature 
filter in SSB modulation. 



the smallest feasible channel bandwidth, B=W, without compromising the perfect 
recovery of the message signal under noiseless conditions. It is for these reasons that we 
say SSB modulation is the optimum form of linear modulation for analog 
communications, preserving both the transmitted power and channel bandwidth in the best 
manner possible. 

SSB modulation may be viewed as a special case of VSB modulation. Specifically, 
setting the vestigial bandwidth f v = 0, we find that the frequency response of the 
quadrature filter plotted in Figure 2.28 takes the limiting form of the signum function 
shown in Figure 2.29. In light of the material presented in (2.60) on Hilbert 
transformation, we therefore find that for / v = 0 the quadrature component /«q(0 becomes 
the Hilbert transform of the message signal m(t), denoted by m(t) . Accordingly, using 
m(t) in place of ?mq(/) in (2.1 10) yields the SSB formula 

s(t) = ^m(t) cos (2nf c t)T^m(t)sm(2nf c t) 

where the minus sign applies to the SSB-modulated signal SusbM and the plus sign 
applies to the alternative SSB-modulated signal slsbW- 

Unlike DSB-SC and VSB methods of modulation, SSB modulation is of limited 
applicability. Specifically, we say: 


This requirement, illustrated in Figure 2.30, is imposed on the message signal m(t ) so that 
the band-pass filter in the frequency-discrimination method of Figure 2.26 has a finite 
transition band for the filter to be physically realizable. With the transition band 
separating the pass-band from the stop-band, it is only when the transition band is finite 
that the undesired sideband can be suppressed. An example of message signals for which 
the energy-gap requirement is satisfied is voice signals; for such signals, the energy gap is 
about 600 Hz, extending from -300 to +300 Hz. 

In contrast, the spectral contents of television signals and wideband data extend 
practically to a few hertz, thereby ruling out the applicability of SSB modulation to this 
second class of message signals. It is for this reason that VSB modulation is preferred over 
SSB modulation for the transmission of wideband signals. 
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Spectrum of a message signal m(t) with an 
energy gap centered around the origin. 



H— Energy gap 


Equation (2.97) for DSB-SC modulation, (2.110) for VSB modulation, and (2.111) for 
SSB modulation are summarized in Table 2.5 as special cases of the canonical formula of 
(2.68). Correspondingly, we may treat the time-domain generations of these three linearly 
modulated signals as special cases of the “synthesizer” depicted in Figure 2.20b. 


Summary of linear modulation methods viewed as special cases of the 
canonical formula s(t) = si(t)cos(2nf c t) -j Q (f)sin(2n/ c /) 



DSB-SC 

m(t) 

zero 

m(t) = message signal 

VSB 

1 , X 

-mil) 

(0 

Plus sign applies to using vestige of 
lower sideband and minus sign applies 
to using vestige of upper sideband 

SSB 

1 / 

2™(0 

A - , , 

Plus sign applies to transmission of 
upper sideband and minus sign applies 
to transmission of lower sideband 


Phase and Group Delays 


A discussion of signal transmission through linear time-invariant systems is incomplete 
without considering the phase and group delays involved in the signal transmission 
process. 

Whenever a signal is transmitted through a dispersive system, exemplified by a 
communication channel (or band-pass filter), some delay is introduced into the output 
signal, the delay being measured with respect to the input signal. In an ideal channel, the 
phase response varies linearly with frequency inside the passband of the channel, in which 
case the filter introduces a constant delay equal to f 0 , where the parameter r 0 controls the 
slope of the linear phase response of the channel. Now, what if the phase response of the 
channel is a nonlinear function of frequency, which is frequently the case in practice? The 
purpose of this section is to address this practical issue. 
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To begin the discussion, suppose that a steady sinusoidal signal at frequency f c is 
transmitted through a dispersive channel that has a phase-shift of /3(f c ) radians at that 
frequency. By using two phasors to represent the input signal and the received signal, we 
see that the received signal phasor lags the input signal phasor by J3(f c ) radians. The time 
taken by the received signal phasor to sweep out this phase lag is simply equal to the ratio 
J3(f c )l(2nf c ) seconds. This time is called the phase delay of the channel. 

It is important to realize, however, that the phase delay is not necessarily the true signal 
delay. This follows from the fact that a steady sinusoidal signal does not carry information, 
so it would be incorrect to deduce from the above reasoning that the phase delay is the true 
signal delay. To substantiate this statement, suppose that a slowly varying signal, over the 
interval -(772) < t < (772), is multiplied by the carrier, so that the resulting modulated 
signal consists of a narrow group of frequencies centered around the carrier frequency; the 
DSB-SC waveform of Figure 2.31 illustrates such a modulated signal. When this 
modulated signal is transmitted through a communication channel, we find that there is 
indeed a delay between the envelope of the input signal and that of the received signal. 
This delay, called the envelope or group delay of the channel, represents the true signal 
delay insofar as the information-bearing signal is concerned. 

Assume that the dispersive channel is described by the transfer function 

H(f) = 77 exp [j /?(/)] 

where the amplitude 77 is a constant scaling factor and the phase /3(f) is a nonlinear 
function of frequency /; it is the nonlinearity of /3(f) that is responsible for the dispersive 




(a) Block diagram of product modulator; (b) Baseband signal; 
(c) DSB-SC modulated wave. 
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nature of the channel. The input signal s(t ) is assumed to be of the kind displayed in Figure 
2.31; that is, the DSB-SC-modulated signal 

s(t ) = m(t)cos(2nf c t) 


where /77(f) is the message signal, assumed to be of a low-pass kind and limited to the 
frequency interval |/| < W. Moreover, we assume that the carrier frequency f c > W. By 
expanding the phase /3(f) in a Taylor series about the point f = f c and retaining only the 
first two terms, we may approximate /3(f) as 


Define two new terms: 


A/)»A/ c ) + (/-/ c )^r 

of 


f=f c 


A/e) 
Tp ~ 2jc/ c 

and 




1 dm 


s 2 jc df 


f=f r 


Then, we may rewrite (2.1 14) in the equivalent form 

fKf) ~ ~ 2n/ c f p - 2n(f-f c ) T g 

Correspondingly, the transfer function of the channel takes the approximate form 


H(f) = ^exp [~j2nf c r p - j2 n(f-f c ) r g ] 

Following the band-pass-to-low-pass transformation described in Section 2.12, in 
particular using (2.80), we may replace the band-pass channel described by H(f) by an 
equivalent low-pass fdter whose transfer function is approximately given by 

H(f) ~ 2A'exp(-j2jt/ c r p -j2jt/T g ), />/ c 

Correspondingly, using (2.67) we may replace the modulated signal s(t ) of (2.113) by its 
low-pass complex envelope, which, for the DSB-SC example at hand, is simply defined by 

s(t) = m(t ) 

Transforming s(t) into the frequency domain, we may write 

S(f) = M(f) 

Therefore, in light of (2.96), the Fourier transform of the complex envelope of the signal 
received at the channel output is given by 


m = \h<j)s<j) 

~ ^exp(-j2jt/ c r p ) exp(-j27t/ c r g )M(f) 

We note that the multiplying factor A'exp(-j27t/ c f p ) is a constant for fixed values of f c 
and T p . We also note from the time-shifting property of the Fourier transform that the term 
exp(-j27t/ c r g )M(/) represents the Fourier transform of the delayed signal m(t- T g ). 
Accordingly, the complex envelope of the channel output is 

x(t) = Kexp(-]2nf c Tp)m(t - r g ) 


Numerical Computation of the Fourier Transform 


69 


Finally, using (2.66) we find that the actual channel output is itself given by 
x(t ) = Re M 0 exp ()2nf c t ) ] 

= Km(t - r g )cos[27t/ c (t- r p )J 

Equation (2.124) reveals that, as a result of transmitting the modulated signal s(t) through 
the dispersive channel, two different delay effects occur at the channel output: 

The sinusoidal carrier wave cos(27t/ c f) is delayed by r p seconds; hence, r p 
represents the phase delay, sometimes r p is referred to as the carrier delay. 

The envelope m(t) is delayed by r g seconds; hence, r g represents the envelope or 
group delay. 

Note that is related to the slope of the phase /3(f), measured at/ = / c . Note also that 
when the phase response /3(f) varies linearly with frequency / and /3(f c ) is zero, the phase 
delay and group delay assume a common value. It is only then that we can think of these 
two delays being equal. 


Numerical Computation of the Fourier Transform 


The material presented in this chapter clearly testifies to the importance of the Fourier 
transform as a theoretical tool for the representation of deterministic signals and linear 
time-invariant systems, be they of the low-pass or band-pass kind. The importance of the 
Fourier transform is further enhanced by the fact that there exists a class of algorithms 
called FFT algorithms for numerical computation of the Fourier transform in an efficient 
manner. 

The FFT algorithm is derived from the discrete Fourier transform (DFT) in which, as 
the name implies, both time and frequency are represented in discrete form. The DFT 
provides an approximation to the Fourier transform. In order to properly represent the 
information content of the original signal, we have to take special care in performing the 
sampling operations involved in defining the DFT. A detailed treatment of the sampling 
process is presented in Chapter 6. For the present, it suffices to say that, given a band- 
limited signal, the sampling rate should be greater than twice the highest frequency 
component of the input signal. Moreover, if the samples are uniformly spaced by T s 
seconds, the spectrum of the signal becomes periodic, repeating every f s = (1 /T s ) hz in 
accordance with (2.43). Let N denote the number of frequency samples contained in the 
interval f s . Hence, the frequency resolution involved in numerical computation of the 
Fourier transform is defined by 

A/ = — = — = - 
N NT S T 

where T is the total duration of the signal. 

Consider then a finite data sequence {g 0 > .... gjy _ j}- For brevity, we refer to this 

sequence as g n , in which the subscript is the time index n = 0, 1, ..., N - 1. Such a sequence 
may represent the result of sampling an analog signal g(t) at times t = 0, T s , ..., (N - I )T S , 
where T s is the sampling interval. The ordering of the data sequence defines the sample 
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time in that g 0 , g t g N _ j denote samples of g(t) taken at times 0, 7’ s , (N - 1 )7’ s , 
respectively. Thus we have 

S n = S(nT s ) 

We formally define the DFT of g n as 

G k = X 2n ex p( _ *if *”) * = 0, 1, N- 1 

72 = 0 

The sequence {Gq, G | , Gy _ | } is called the transform sequence. For brevity, we refer 

to this second sequence simply as G jt , in which the subscript is the frequency index k = 0, 
1 , N — 1. 

Correspondingly, we define the inverse discrete Fourier transform (IDFT) of G/, as 

s n = h £ 

k = 0 

The DFT and the IDFT form a discrete transform pair. Specifically, given a data sequence 
g n , we may use the DFT to compute the transform sequence G/.; and given the transform 
sequence G/., we may use the IDFT to recover the original data sequence g n . A distinctive 
feature of the DFT is that, for the finite summations defined in (2.127) and (2.128), there is 
no question of convergence. 

When discussing the DFT (and algorithms for its computation), the words “sample” 
and “point” are used interchangeably to refer to a sequence value. Also, it is common 
practice to refer to a sequence of length N as an N -point sequence and to refer to the DFT 
of a data sequence of length N as an N -point DFT. 


We may visualize the DFT process described in (2.127) as a collection of N complex 
heterodyning and averaging operations, as shown in Figure 2.32a. We say that the 
heterodyning is complex in that samples of the data sequence are multiplied by complex 
exponential sequences. There is a total of N complex exponential sequences to be 
considered, corresponding to the frequency index k = 0, 1, ..., N - 1. Their periods have 
been selected in such a way that each complex exponential sequence has precisely an 
integer number of cycles in the total interval 0 to N - 1 . The zero-frequency response, 
corresponding to k = 0, is the only exception. 

For the interpretation of the IDFT process, described in (2.128), we may use the 
scheme shown in Figure 2.32b. Here we have a collection of N complex signal generators , 
each of which produces the complex exponential sequence 

exp {^j^knj = cos (^krij + j sin^^kn) 

= { cos (f ' tn )’ ^(m} 

L h = o 

Thus, in reality, each complex signal generator consists of a pair of generators that output 
a cosinusoidal and a sinusoidal sequence of k cycles per observation interval. The output 
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of each complex signal generator is weighted by the complex Fourier coefficient G k . At 
each time index n, an output is formed by summing the weighted complex generator 
outputs. 

It is noteworthy that although the DFT and the IDFT are similar in their mathematical 
formulations, as described in (2.127) and (2.128), their interpretations as depicted in 
Figure 2.32a and b are so completely different. 


Sn ■ 


exp(-^0«) 

FV N ' 


cX_, 


„(-»-« .) 
FV AT ' 


exp( - — -2n) 
Fv N ’ 


exp(-^(N-l)n) 



On 


G N - 1 


(a) 


exp(^On) 

Fv N ' 


exp(^ n) 
Fv N ' 


exp(i-2! 2 n) 
FV N 


expC^tiV-Dn) 



(b) 

Interpretations of (a) the DFT and (b) the IDFT. 
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Also, the addition of harmonically related periodic signals, involved in these two parts 
of the figure, suggests that their outputs Gj. and g n must be both periodic. Moreover, the 
processors shown in Figure 2.32 are linear, suggesting that the DFT and IDFT are both 
linear operations. This important property is also obvious from the defining equations 
(2.127) and (2.128). 


In the DFT both the input and the output consist of sequences of numbers defined at 
uniformly spaced points in time and frequency, respectively. This feature makes the DFT 
ideally suited for direct numerical evaluation on a computer. Moreover, the computation 
can be implemented most efficiently using a class of algorithms, collectively called FFT 
algorithms. An algorithm refers to a “recipe” that can be written in the form of a computer 
program. 

FFT algorithms are efficient because they use a greatly reduced number of arithmetic 
operations as compared with the brute force (i.e., direct) computation of the DFT. 
Basically, an FFT algorithm attains its computational efficiency by following the 
engineering strategy of “divide and conquer,” whereby the original DFT computation is 
decomposed successively into smaller DFT computations. In this section, we describe one 
version of a popular FFT algorithm, the development of which is based on such a strategy. 

To proceed with the development, we first rewrite (2.127), defining the DFT of g n , in 
the convenient mathematical form 

N-l Jen 

G k = X ’ k = 0,1,..., N-l 

n = 0 

where we have introduced the complex parameter 

W = eXp (- J -f ) 

From this definition, we readily see that 

bA = 1 

w N/ 2 = -1 

w (l+lNKn + mN) = ^ = 0> ±1) + 2j . . . 

That is, W^ 7 ' is periodic with period N. The periodicity of W kn is a key feature in the 
development of FFT algorithms. 

Let N, the number of points in the data sequence, be an integer power of two, as shown 
by 

N = 2 L 

where L is an integer; the rationale for this choice is explained later. Since N is an even 
integer, N/2 is an integer, and so we may divide the data sequence into the first half and 
last half of the points. 


Numerical Computation of the Fourier Transform 


73 


Thus, we may rewrite (2.130) as 


(JV/2) - 1 


n = 0 


= '"z , s- w ‘" + z 1 

n = N/2 


(N/ 2) - 1 


(JV/2) - 1 


= z s,y ,,+ z 


Sn+N/2 


W* ( 


rc + N/2) 


rc = 0 


71 = 0 


(N/2) -l 

= t (Sn + S n + N/ y )W kn k = 0,l,...,N-l 

n = 0 

Since W N,2 = -1, we have 

= (_l) fc 

Accordingly, the factor W kNI2 in (2.132) takes on only one of two possible values, namely 
+1 or -1, depending on whether the frequency index k is even or odd, respectively. These 
two cases are considered in what follows. 

First, let k be even, so that W kN/2 = 1 . Also let 


N 

k = 21, 1 = 0, 1 , - 1 

2 

and define 


X n - S n + g n+ N/2 

Then, we may put (2.132) into the new form 


G 2i - 


(N/2) - 1 


x„W 


2 In 


n = 0 


(N/2) - 1 , 

= £ *„("o 


In 


n = 0 


/ = o, 1, 


From the definition of W given in (2.131), we readily see that 



Hence, we recognize the sum on the right-hand side of (2.134) as the (A/2)-point DFT of 
the sequence x n . 

Next, let k be odd so that W kN/2 = -1. Also, let 


/ = 0 , 1 , ..., 



k = 21+ 1 , 
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and define 


y n &n &n + N/ 2 

Then, we may put (2.132) into the corresponding form 

-2/+1 t (21 + 1 )n 

G = Zj W 

n = 0 


(N/_ 2)-l 

= £ 


n = 0 


[y n W n ](W 2 )‘ n 


°- 1 f - 1 


We recognize the sum on the right-hand side of (2.136) as the (M2)-point DFT of the 
sequence y n W". The parameter W n associated with y n is called the twiddle factor. 

Equations (2.134) and (2.136) show that the even- and odd- valued samples of the 
transform sequence 6’/. can be obtained from the (M2)-point DFTs of the sequences x n and 
y n W n , respectively. The sequences x n and y n are themselves related to the original data 
sequence g n by (2.133) and (2.135), respectively. Thus, the problem of computing an 
/V-point DFT is reduced to that of computing two (M2)-point DFTs. The procedure just 
described is repeated a second time, whereby an (M2)-point DFT is decomposed into two 
(M4)-point DFTs. The decomposition procedure is continued in this fashion until (after 
L = logi/V stages) we reach the trivial case of N single-point DFTs. 

Figure 2.33 illustrates the computations involved in applying the formulas of (2.134) 
and (2.136) to an eight-point data sequence; that is, /V = 8. In constructing left-hand 
portions of the figure, we have used signal-flow graph notation. A signal-flow graph 
consists of an interconnection of nodes and branches. The direction of signal transmission 
along a branch is indicated by an arrow. A branch multiplies the variable at a node (to 
which it is connected) by the branch transmittance. A node sums the outputs of all 
incoming branches. The convention used for branch transmittances in Figure 2.33 is as 
follows. When no coefficient is indicated on a branch, the transmittance of that branch is 
assumed to be unity. For other branches, the transmittance of a branch is indicated by -1 or 
an integer power of W, placed alongside the arrow on the branch. 

Thus, in Figure 2.33a the computation of an eight-point DFT is reduced to that of two 
four-point DFTs. The procedure for the eight-point DFT may be mimicked to simplify the 
computation of the four-point DFT. This is illustrated in Figure 2.33b, where the 
computation of a four-point DFT is reduced to that of two two-point DFTs. Finally, the 
computation of a two-point DFT is shown in Figure 2.33c. 

Combining the ideas described in Figure 2.33, we obtain the complete signal-flow 
graph of Figure 2.34 for the computation of the eight-point DFT. A repetitive structure, 
called the butterfly with two inputs and two outputs, can be discerned in the FFT algorithm 
of Figure 2.34. Examples of butterflies (for the three stages of the algorithm) are shown by 
the bold-faced lines in Figure 2.34. 

For the general case of N = 2 L , the algorithm requires L = log 9 A stages of computation. 
Each stage requires (M2) butterflies. Each butterfly involves one complex multiplication 
and two complex additions (to be precise, one addition and one subtraction). Accordingly, 
the FFT structure described here requires (M2)log 2 /V complex multiplications and /Vlog 2 /V 
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(a) Data sequence Transform sequence 



Coefficients for 
even frequencies 


Coefficients for 
odd frequencies 


(b) Data sequence Transform sequence 



Coefficients for 
even frequencies 


Coefficients for 
odd frequencies 


(c) Data sequence Transform sequence 



-1 


(a) Reduction of eight-point DFT into two four-point DFTs. (b) Reduction of four-point 
DFT into two two-point DFTs. (c) Trivial case of two-point DFT. 
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Decimation-in-frequency 
FFT algorithm. 


Data sequence Transform sequence 



W 1 W 6 -1 

Stage I Stage II Stage III 


complex additions; actually, the number of multiplications quoted is pessimistic, because 
we may omit all twiddle factors W° = 1 and W NI 2 = -1, W N 14 = j, W 3N/4 = -j. This 
computational complexity is significantly smaller than that of the N complex 
multiplications and N(N - 1) complex additions required for direct computation of the 
DFT. The computational savings made possible by the FFT algorithm become more 
substantial as we increase the data length N. For example, for N = 8192 = 2 , the direct 
approach requires approximately 630 times as many arithmetic operations as the FFT 
algorithm, hence the popular use of the FFT algorithm in computing the DFT. 

We may establish two other important features of the FFT algorithm by carefully 
examining the signal-flow graph shown in Figure 2.34: 

At each stage of the computation, the new set of N complex numbers resulting from 
the computation can be stored in the same memory locations used to store the 
previous set. This kind of computation is referred to as in-place computation. 
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The samples of the transform sequence G k are stored in a bit-reversed order. To 
illustrate the meaning of this terminology, consider Table 2.6 constructed for the 
case of AT = 8. At the left of the table, we show the eight possible values of the 
frequency index k (in their natural order) and their 3-bit binary representations. At 
the right of the table, we show the corresponding bit-reversed binary representations 
and indices. We observe that the bit-reversed indices in the rightmost column of 
Table 2.6 appear in the same order as the indices at the output of the FFT algorithm 
in Figure 2.34. 


Illustrating bit reversal 
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The FFT algorithm depicted in Figure 2.34 is referred to as a decimation-in-frequency 
algorithm , because the transform (frequency) sequence G/. is divided successively into 
smaller subsequences. In another popular FFT algorithm, called a decimation-in-time 
algorithm , the data (time) sequence g n is divided successively into smaller subsequences. 
Both algorithms have the same computational complexity. They differ from each other in 
two respects. First, for decimation-in-frequency, the input is in natural order, whereas the 
output is in bit-reversed order; the reverse is true for decimation-in-time. Second, the 
butterfly for decimation-in-time is slightly different from that for decimation-in- 
frequency. The reader is invited to derive the details of the decimation-in-time algorithm 
using the divide-and-conquer strategy that led to the development of the algorithm 
described in Figure 2.34. 

In devising the FFT algorithm presented herein, we placed the factor 1/A in the formula 
for the forward DFT, as shown in (2.128). In some other FFT algorithms, location of the 
factor 1/A is reversed. In yet other formulations, the factor \/ Jn is placed in the 
formulas for both the forward and inverse DFTs for the sake of symmetry. 


The IDFT of the transform Gf. is defined by (2.128). We may rewrite this equation in terms 
of the complex parameter W as 


1 „ ,.,-kn 

A ■ 


^a t w 


k= 0 




n = 0, 1, ..., A- 1 
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Complex 

G* k _ 



Complex 


Divide 

S n 


conjugate 




conjugate 


by N 



Use of the FFT algorithm for computing the IDFT. 


Taking the complex conjugate of (2.137) and multiplying by N, we get 

Ng* n = N ^G* k W- kn , n = 0,1, 1 

k = 0 

The right-hand side of (2.138) is recognized as the /V-point DFT of the complex- 
conjugated sequence G\ . Accordingly, (2.138) suggests that we may compute the desired 
sequence g n using the scheme shown in Figure 2.35, based on an /V-point FFT algorithm. 
Thus, the same FFT algorithm can be used to handle the computation of both the IDFT 
and the DFT. 

Summary and Discussion 


In this chapter we have described the Fourier transform as a fundamental tool for relating 
the time-domain and frequency-domain descriptions of a deterministic signal. The signal 
of interest may be an energy signal or a power signal. The Fourier transform includes the 
exponential Fourier series as a special case, provided that we permit the use of the Dirac 
delta function. 

An inverse relationship exists between the time-domain and frequency-domain 
descriptions of a signal. Whenever an operation is performed on the waveform of a signal 
in the time domain, a corresponding modification is applied to the spectrum of the signal 
in the frequency domain. An important consequence of this inverse relationship is the fact 
that the time-bandwidth product of an energy signal is a constant; the definitions of signal 
duration and bandwidth merely affect the value of the constant. 

An important signal-processing operation frequently encountered in communication 
systems is that of linear filtering. This operation involves the convolution of the input 
signal with the impulse response of the filter or, equivalently, the multiplication of the 
Fourier transform of the input signal by the transfer function (i.e., Fourier transform of the 
impulse response) of the filter. Low-pass and band-pass filters represent two commonly 
used types of filters. Band-pass filtering is usually more complicated than low-pass 
filtering. However, through the combined use of a complex envelope for the representation 
of an input band-pass signal and the complex impulse response for the representation of a 
band-pass filter, we may formulate a complex low-pass equivalent for the band-pass 
filtering problem and thereby replace a difficult problem with a much simpler one. It is 
also important to note that there is no loss of information in establishing this equivalence. 
A rigorous treatment of the concepts of complex envelope and complex impulse response 
as presented in this chapter is rooted in Hilbert transformation. 

The material on Fourier analysis, as presented in this chapter, deals with signals whose 
waveforms can be nonperiodic or periodic, and whose spectra can be continuous or 
discrete functions of frequency. In this sense, the material has general appeal. 
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Building on the canonical representation of a band-pass signal involving the in-phase 
and quadrature components of the signal, we showed that this representation provides an 
elegant way of describing the three basic forms of linear modulation, namely DSB-SC, 
VSB, and SSB. 

With the Fourier transform playing such a pervasive role in the study of signals and 
linear systems, we finally described the FFT algorithm as an efficient tool for numerical 
computation of the DFT that represents the uniformly sampled versions of the forward and 
inverse forms of the ordinary Fourier transform. 


Problems 

The Fourier Transform 

Prove the dilation property of the Fourier transform, listed as Property 2 in Table 2. 1 . 


Prove the duality property of the Fourier transform, listed as Property 3 in Table 2.1. 

Prove the time-shifting property, listed as Property 4; and then use the duality property to prove 
the frequency-shifting property, listed as Property 5 in the table. 

Using the frequency-shifting property, determine the Fourier transform of the radio frequency RF 
pulse 

g(0 = Arect^Jcos(2jt/ c t) 
assuming that/ c is larger than (1 IT). 

Prove the multiplication-in-the-time-domain property of the Fourier transform, listed as Property 
1 1 in Table 2. 1 . 

Prove the convolution in the time-domain property, listed as Property 12. 

Using the result obtained in part b, prove the correlation theorem, listed as Property 13. 

Prove Rayleigh’s energy theorem listed as Property 14 in Table 2.1. 

The following expression may be viewed as an approximate representation of a pulse with finite rise 


where it is assumed that T » r. Determine the Fourier transform of g(t). What happens to this 
transform when we allow rto become zero? Hint: Express g(t) as the superposition of two signals, 
one corresponding to integration from t — T to 0, and the other from 0 to r + T. 

The Fourier transform of a signal g(f) is denoted by G(f). Prove the following properties of the 
Fourier transform: 

If a real signal g(t) is an even function of time t, the Fourier transform G{f) is purely real. If a 
real signal g(t) is an odd function of time t, the Fourier transform G(f) is purely imaginary. 


time: 




where G <n Hf) is the nth derivative of G(f) with respect to/. 
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Assuming that both gi(t) and g 2 (t) are complex signals, show that: 
gi«g 2 (0 - f G^G^A-fldA 


and 


J~ giWslmt = f G^clifidf 


The root mean-square (rms) bandwidth of a low-pass signal g(t) of finite energy is defined by 

r oo -.1/2 

j f\G(f)\ 2 df 


W 


j \G(f)\ 2 df 


where |G(/)| 2 is the energy spectral density of the signal. Correspondingly, the root mean-square 
(rms) duration of the signal is defined by 

-i 1/2 

} t 2 \g(t)\ 2 dt 


Using these definitions, show that 


j lg(0| 2 df 


T W > — 

rms rms 


Assume that |g(f)| — > 0 faster than I / J\t\ as t\ 
Consider a Gaussian pulse defined by 


g(0 = exp (-Jt f ) 


Show that for this signal the equality 


T W = — 

rms rms 


is satisfied. 

Hint: Use Schwarz’s inequality 


j [gi(0g 2 (0 + gi(0g 2 *(0]dtj <4j |gj(r)| 2 drJ |g 2 (r)| 2 dr 


in which we set 
and 


gl(0 = tg(t) 


g 2 (0 


Mil 

dt 


The Dirac comb , formulated in the time domain, is defined by 

&r 0 (l) = S(t-mT 0 ) 


where T 0 is the period. 
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Show that the Dirac comb is its own Fourier transform. That is, the Fourier transform of S T ^t) is 
also an infinitely long periodic train of delta functions, weighted by the factor f 0 = (1/Jq) and 
regularly spaced byf 0 along the frequency axis. 

Hence, prove the pair of dual relations: 

Y S(t-mT 0 ) = f Q ^ exp(j2jtn/ 0 f) 

m = -oo n = — oo 


T o Y ex p(j 2jtm / T o) = Y 

m = -°° n = -oo 

Finally, prove the validity of (2.38). 

Signal Transmission through Linear Time-invariant Systems 

The periodic signal 

x(t) = ^ x(nT 0 )S(t-nT 0 ) 

m = -oo 

is applied to a linear system of impulse response h(t). Show that the average power of the signal y(t) 
produced at the system output is defined by 

P m ,y = Y \x("T 0 )\ 2 \H(nf 0 )\ 2 

n = -oo 

where H(f) is the frequency response of the system, and/ 0 = 1/T 0 . 

According to the bounded input-bounded output stability criterion, the impulse response h(t) of a 
linear-invariant system must be absolutely integrable; that is, 

| |/7(t)|df<°° 

Prove that this condition is both necessary and sufficient for stability of the system. 


Hilbert Transform and Pre-envelopes 

Prove the three properties of the Hilbert transform itemized on pages 43 and 44. 

Let g(t ) denote the Hilbert transform of g(t). Derive the set of Hilbert-transform pairs listed as 
items 5 to 8 in Table 2.3. 

Evaluate the inverse Fourier transform g(t) of the one-sided frequency function: 


G(f) 


exp (-/), /> 0 

f= 0 

2 J 

0 , /<0 


Show that g{t) is complex, and that its real and imaginary parts constitute a Hilbert-transform pair. 
Let g(t) denote the Hilbert transform of a Fourier transformable signal g(t). Show that ^g(t) is 
equal to the Hilbert transform of —g(t). 
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In this problem, we revisit Problem 2.14, except that this time we use integration rather than 
differentiation. Doing so, we find that, in general, the integral J g(f) dr is not equal to the Hilbert 
transform of the integral J g(r) dr . 
lustify this statement. 

Find the condition for which exact equality holds. 

Determine the pre-envelope g+(r) corresponding to each of the following two signals: 
g(0 = sinc(r) 

g(r) = [1 + k cos(2nf m t)]cos(2nf c t) 


Complex Envelope 

Show that the complex envelope of the sum of two narrowband signals (with the same carrier 
frequency) is equal to the sum of their individual complex envelopes. 

The definition of the complex envelope i(f) of a band-pass signal given in (2.65) is based on the 
pre-envelope s + (t) for positive frequencies. How is the complex envelope defined in terms of the pre- 
envelope s_(r) for negative frequencies? Justify your answer. 

Consider the signal 

•s(r) = c(r)m(r) 

whose m(r) is a low-pass signal whose Fourier transform M(f) vanishes for | / 1 > W, and c{t) is a 
high-pass signal whose Fourier transform C(f) vanishes for |/j < W. Show that the Hilbert transform 
of s(t) is s(t) = c(t)m(t ) , where c(t) is the Hilbert transform of c(t). 

Consider two real-valued signals Vj(r) and s 2 (t) whose pre-envelopes are denoted by S] + (f) and 
s 2 +(?), respectively. Show that 

J Re[s 1+ (t)]Re[s 2+ (r)]dr = ^Re |^J s l+ (t)s 2+ (.t) dt 

Suppose that s 2 (t) is replaced by s 2 (—t). Show that this modification has the effect of removing 
the complex conjugation in the right-hand side of the formula given in part a. 

Assuming that s(t) is a narrowband signal with complex envelope s(t) and carrier frequency f c , 
use the result of part a to show that 

| s 2 (t)dt = l - J |S(f)| 2 dt 

Let a narrow-band signal s(t) be expressed in the form 

s(t) = i I (0cos(27[/ c 0-JQ(0sin(2jc/ c 0 

Using S + (f) to denote the Fourier transform of the pre-envelope of s + (f), show that the Fourier 
transforms of the in-phase component .sj(f) and quadrature component ig(f) are given by 

Sjif) = \lS + (f+f c ) + S* + (-f+f c )] 

S Q (f) = jj[S + (f+/ c )-S*(-/+/ c )] 

respectively, where the asterisk denotes complex conjugation. 

The block diagram of Figure 2.20a illustrates a method for extracting the in-phase component 5j(f) 
and quadrature component iq(r) of a narrowband signal s(t). Given that the spectrum of s(t) is 
limited to the interval f c -W<\f\f c + W, demonstrate the validity of this method. Hence, show that 
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and 


.S',(/) = 


. S(f-f c ) + S(f+f c ), 

0, 


-W<f<W 

elsewhere 


S Q if) = 


|j m-f c )-s(f+f c )], 
0, 


-W<f< w 

elsewhere 


where Sj(/), Sq(/), and 5(/) are the Fourier transforms of .vj(f), SQ(t), and s(t), respectively. 

Low-Pass Equivalent Models of Band-Pass Systems 

Equations (2.82) and (2.83) define the in-phase component H^(f) and the quadrature component 
Hq(J) of the frequency response H(f) of the complex low-pass equivalent model of a band-pass 
system of impulse response h(t). Prove the validity of these two equations. 

Explain what happens to the low-pass equivalent model of Figure 2.21b when the amplitude 
response of the corresponding bandpass filter has even symmetry and the phase response has odd 
symmetry with respect to the mid-band frequency^. 

The rectangular RF pulse 

Acos(27t/f), 0 <t<T 
x(t) = < c 

0, elsewhere 


is applied to a linear filter with impulse response 

h(t) = x(T - 1 ) 

Assume that the frequency f c equals a large integer multiple of 1 IT. Determine the response of the 
filter and sketch it. 

Figure P2.26 depicts the frequency response of an idealized band-pass filter in the receiver of a 
communication system, namely H(f), which is characterized by a bandwidth of 2 B centered on the 
carrier frequency f c . The signal applied to the band-pass filter is described by the modulated sine 
function: 

x(t) = 4A C B sinc(2 Bt) cos[27t(/ c + A f)t] 

where A/ is frequency misalignment introduced due to the receiver’s imperfections, measured with 
respect to the carrier A c cos(2nf c t) . 

Find the complex low-pass equivalent models of the signal x(t) and the frequency response H(f). 
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Then, go on to find the complex low-pass response of the filter output, denoted by y(t) , which 
includes distortion due to +A/. 

Building on the formula derived for y(f) obtained in part b, explain how you would mitigate the 
misalignment distortion in the receiver. 


Nonlinear Modulations 


In analog communications, amplitude modulation is defined by 

•SamM = A c [! +k a m(t)] cos(2ji f c t) 


where A c cos(2nf c t) is the carrier, m(t) is the message signal, and k a is a constant called amplitude 
sensitivity of the modulator. Assume that |£ a m(t)| < 1 for all time t. 

Justify the statement that, in a strict sense, Sam( 0 violates the principle of superposition. 
Formulate the complex envelope s AM (t) and its spectrum. 

Compare the result obtained in part b with the complex envelope of DSB-SC. Hence, comment 
on the advantages and disadvantages of amplitude modulation. 


Continuing on with analog communications, frequency modulation (FM) is defined by 


*Fm( 0 - A c 


cos(2jt/ c t) + fc f | m(T) dt 


where A c cos(27t/ c t) is the carrier, m(t) is the message signal, and k { is a constant called the 
frequency sensitivity of the modulator. 

Show that frequency modulation is nonlinear in that it violates the principle of superposition. 
Formulate the complex envelope of the FM signal, namely •*fmW • 

Consider the message signal to be in the form of a square wave as shown in Figure P2.28. The 
modulation frequencies used for the positive and negative amplitudes of the square wave, namely 
f\ and / 2 , are defined as follows: - 

f\ + fl = y 

1 b 

f\~fi = y 

where T b is the duration of each positive or negative amplitude in the square wave. Show that 
under these conditions the complex envelope s FM (f) maintains continuity for all time f, 
including the switching times between positive and negative amplitudes. 

Plot the real and imaginary parts of ,s FM (t) for the following values: 


/, - i Hz 

h = ^ Hz 


Phase and Group Delays 

The phase response of a band-pass communication channel is defined by. 


<P(f) = -tan 1 


, ? 2 

f 


ffc 
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A sinusoidally modulated signal defined by 

s(t) = A c cos(27t/ m f)cos(2ji/ c 0 

is transmitted through the channel;^. is the carrier frequency and/ m is the modulation frequency. 
Determine the phase delay Z"p 
Determine the group delay r 0 

Display the waveform produced at the channel output; hence, comment on the results obtained in 
parts a and b. 

Notes 


1 . For a proof of convergence of the Fourier series, see Kammler (2000). 

r°° 2 

2. If a time function g(t) is such that the value of the energy |g(t)|“dr is defined and finite, then 
the Fourier transform G(f) of the function g(t) exists and 


lim [f~ 

*(f)-f G(/)exp(j2jt/i) d/ 

2 1 

A — » o°L _oo 

—A 

- 


This result is known as Plancherel’s theorem. For a proof of this theorem, see Titchmarsh (1950). 

3. The notation S(t ) for a delta function was first introduced into quantum mechanics by Dirac. This 
notation is now in general use in the signal processing literature. For detailed discussions of the delta 
function, see Bracewell (1986). 

In a rigorous sense, the Dirac delta function is a distribution, not a function; for a rigorous treatment 
of the subject, see the book by Lighthill (1958). 

4. The Paley-Wiener criterion is named in honor of the authors of the paper by Paley and Wiener 
(1934). 

5. The integral in (2.54), defining the Hilbert transform of a signal, is an improper integral in that 
the integrand has a singularity at T= t. To avoid this singularity, the integration must be carried out in 
a symmetrical manner about the point r= t. For this purpose, we use the definition 



£ii) dr= lim rr 6 sm dT+ f g& dT - 

t-z g->oL J _„o t-r J, + e t-r 


where the symbol P denotes Cauchy’s principal value of the integral and s is incrementally 
small. For notational simplicity, the symbol P has been omitted from (2.54) and (2.55). 

6. The complex representation of an arbitrary signal defined in (2.58) was first described by Gabor 
(1946). Gabor used the term “analytic signal.” The term “pre-envelope” was used in Arens (1957) 
and Dungundji (1958). For a review of the different envelopes, see the paper by Rice (1982). 

7. The FFT is ubiquitous in that it is applicable to a great variety of unrelated fields. For a detailed 
mathematical treatment of this widely used tool and its applications, the reader is referred to 
Brigham (1988). 



Probability Theory and 
Bayesian inference 


Introduction 


The idea of a mathematical model used to describe a physical phenomenon is well 
established in the physical sciences and engineering. In this context, we may distinguish 
two classes of mathematical models: deterministic and probabilistic. A model is said to be 
deterministic if there is no uncertainty about its time-dependent behavior at any instant of 
time; linear time-invariant systems considered in Chapter 2 are examples of a 
deterministic model. However, in many real-world problems, the use of a deterministic 
model is inappropriate because the underlying physical phenomenon involves too many 
unknown factors. In such situations, we resort to a probabilistic model that accounts for 
uncertainty in mathematical terms. 

Probabilistic models are needed for the design of systems that are reliable in 
performance in the face of uncertainty, efficient in computational terms, and cost effective 
in building them. Consider for example, a digital communication system that is required to 
provide practically error-free communication across a wireless channel. Unfortunately, the 
wireless channel is subject to uncertainties, the sources of which include: 

• noise, internally generated due to thermal agitation of electrons in the conductors 
and electronic devices at the front-end of the receiver; 

• fading of the channel, due to the multipath phenomenon — an inherent characteristic 
of wireless channels; 

• interference, representing spurious electromagnetic waves emitted by other 
communication systems or microwave devices operating in the vicinity of the receiver. 

To account for these uncertainties in the design of a wireless communication system, we 
need a probabilistic model of the wireless channel. 

The objective of this chapter, devoted to probability theory, is twofold: 

• the formulation of a logical basis for the mathematical description of probabilistic 
models and 

• the development of probabilistic reasoning procedures for handling uncertainty. 
Since the probabilistic models are intended to assign probabilities to the collections (sets) 
of possible outcomes of random experiments, we begin the study of probability theory 
with a review of set theory, which we do next. 
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Set Theory 


The objects constituting a set are called the elements of the set. Let A be a set and x be an 
element of the set A. To describe this statement, we write x e A ; otherwise, we write 
x <£ A. If the set A is empty (i.e., it has no elements), we denote it by 0 . 

If xj, Xo, ..., x N are all elements of the set A, we write 

A = {x p x 2 , ... ,x N } 

in which case we say that the set A is countably finite. Otherwise, the set is said to be 
countably infinite. Consider, for example, an experiment involving the throws of a die. In 
this experiment, there are six possible outcomes: the showing of one, two, three, four, five, 
and six dots on the upper surface of the die; the set of possible outcomes of the experiment 
is therefore countably finite. On the other hand, the set of all possible odd integers, written 
as { ± 1 , ±3 , ±5, . . . } , is countably infinite . 

If every element of the set A is also an element of another set B, we say that A is a 
subset of B, which we describe by writing A a B . 

If two sets A and B satisfy the conditions Acfi and BcA, then the two sets are said 
to be identical or equal, in which case we write A = B. 

In a discussion of set theory, we also find it expedient to think of a universal set , 
denoted by S. Such a set contains every possible element that could occur in the context of 
a random experiment. 


To illustrate the validity of Boolean operations on sets, the use of Venn diagrams can be 
helpful, as shown in what follows. 

Unions and Intersections 

The union of two sets A and B is defined by the set of elements that belong to A or B, or to 
both. This operation, written as A u B , is illustrated in the Venn diagram of Figure 3.1. 
The intersection of two sets A and B is defined by the particular set of elements that belong 
to both A and B, for which we write Ar\B . The shaded part of the Venn diagram in 
Figure 3.1 represents this second operation. 


Universal set S 


AuB 



Illustrating the union and intersection 
of two sets, A and B. 
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Illustrating the partition of set A into 
three subsets: A 3 , A 2 , and A 3 . 


Universal set S 



Let x be an element of interest. Mathematically, the operations of union and 
intersection are respectively described by 

AuB = {x\x g A or x £ B} 

and 


AnB = {x\x £ A and x £ B] 
where the symbol | is shorthand for “such that.” 


Disjoint and Partition Sets 

Two sets A and B are said to be disjoint if their intersection is empty; that is, they have no 
common elements. 

The partition of a set A refers to a collection of disjoint subsets Aj, A 2 , .... A N of the set 
A, the union of which equals A; that is, 

A = AjUAt ... u A^ 

The Venn diagram illustrating the partition operation is depicted in Figure 3.2 for the 
example of N = 3. 

Complements 

The set A c is said to be the complement of the set A, with respect to the universal set S, if it 
is made up of all the elements of S that do not belong to A, as depicted in Figure 3.3. 



Illustrating the complement A c of set A. 
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The Algebra of Sets 

Boolean operations on sets have several properties, summarized here: 

Idempotence property 
(A c ) c = A 

Commutative property 
A u B = SuA 
A n B = B n A 
Associative property 
A u (B u C) = (Aufi)uC 
A n (B n C) = (A n B) n C 
Distributive property 
An(fiuC) = (AnU)u(An C) 

A u (B n C) = (Aui)n(Au C) 

Note that the commutative and associative properties apply to both the union and 
intersection, whereas the distributive property applies only to the intersection. 

De Morgan ’s laws 

The complement of the union of two sets A and B is equal to the intersection of their 
respective complements; that is 

(A u B) c = A c nB c 

The complement of the intersection of two sets A and B is equal to the union of their 
respective complements; that is, 

(A n B)° = A c u B c 

For illustrations of these five properties and their confirmation, the reader is referred to 
Problem 3.1. 


Probability Theory 


The mathematical description of an experiment with uncertain outcomes is called a 
probabilistic model , the formulation of which rests on three fundamental ingredients: 

Sample space or universal set S, which is the set of all conceivable outcomes of a 
random experiment under study. 

A class E of events that are subsets of S. 

Probability law , according to which a nonnegative measure or number P[A] is 
assigned to an event A. The measure P[A] is called the probability of event A. In a 
sense, P[A] encodes our belief in the likelihood of event A occurring when the 
experiment is conducted. 

Throughout the book, we will use the symbol P[ .] to denote the probability of occurrence 
of the event that appears inside the square brackets. 
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As illustrated in Figure 3.4, an event may involve a single outcome or a subset of 
possible outcomes in the sample space S. These possibilities are exemplified by the way in 
which three events. A, B, and C, are pictured in Figure 3.4. In light of such a reality, we 
identify two extreme cases: 

• Sure event , which embodies all the possible outcomes in the sample space S. 

• Null or impossible event, which corresponds to the empty set or empty space 0 . 


Fundamentally, the probability measure P[A], assigned to event A in the class E, is 
governed by three axioms: 

Axiom I Nonnegativity The first axiom states that the probability of event A is a 
nonnegative number bounded by unity, as shown by 

O < P[A] < 1 for any event A 

Axiom II Additivity The second axiom states that if A and B are two disjoin t events, 
then the probability of their union satisfies the equality 

P[AuB] = P[A] + P[5] 

In general, if the sample space has N elements and A b A 2 , . . ., A N is a sequence of disjoint 
events, then the probability of the union of these N events satisfies the equality 

P[Aj uA,u ... A n ] = P[A 1 ] + P[A 2 ]+... + P[A w ] 

Axiom III Normalization The third and final axiom states that the probability of the 
entire sample space S is equal to unity, as shown by 

P [S] = 1 

These three axioms provide an implicit definition of probability. Indeed, we may use them 
to develop some other basic properties of probability, as described next. 
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The probability of an impossible event is zero. 

To prove this property, we first use the axiom of normalization, then express the sample 
space S as the union of itself with the empty space 0 , and then use the axiom of 
additivity. We thus write 

1 = P[S] 

= P[Su0] 

= P[S] + P[0] 

= 1 + P[0] 

from which the property P[0] = 0 follows immediately. 

Let A c denote the complement of event A; we may then write 

P[A C ] = 1 - P[A] for any event A 

To prove this property, we first note that the sample space S is the union of the two 
mutually exclusive events A and A c . Hence, the use of the additivity and normalization 
axioms yields 

1 = P[S] 

= P[AuA c ] 

= P[A] + P[A C ] 

from which, after rearranging terms, (3.4) follows immediately. 

If event A lies within the subspace of another event B. then 

P[A]<P[5] for AcB 

To prove this third property, consider the Venn diagram depicted in Figure 3.5. From this 
diagram, we observe that event B may be expressed as the union of two disjoint events, one 
defined by A and the other defined by the intersection of B with the complement of A; that is, 

B = A u (£ n A c ) 



The Venn diagram for proving (3.5). 
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Therefore, applying the additivity axiom to this relation, we get 

P[5] = P[A] + P[5nA c ] 

Next, invoking the nonnegativity axiom, we immediately find that the probability of event 
B must be equal to or greater than the probability of event A, as indicated in (3.5). 


Let N disjoint events A u A 2 , . . A N satisfy the condition 


Aj u At u ... A N = S 

then 

P[A 1 ] + P[A 2 ]+... + P[A w ] = 1 

To prove this fourth property, we first apply the normalization axiom to (3.6) to write 

P[AjUA 2 u ... uA w ] = 1 

Next, recalling the generalized form of the additivity axiom 

P[Aj uA 2 u ... A n ] = P[A 1 ] + P[A 2 ]+... + P[A w ] 

From these two relations, (3.7) follows immediately. 

For the special case of N equally probable events, (3.7) reduces to 

P[A,] = i for i = 1,2, N 


If two events A and B are not disjoint, then the probability of their union event is defined by 
P[Aufi] = P[A] + P[B]-P[AnB] for any two events A and B 

where P[An5] is called the joint probability of A and B. 

To prove this last property, consider the Venn diagram of Figure 3.6. From this figure, 
we first observe that the union of A and B may be expressed as the union of two disjoint 
events: A itself and A n B , where A e is the complement of A. We may therefore apply the 
additivity axiom to write 

P[A u B] = P[Au(A c nB)] 

= P[A] + P[A C n B] 


Universal set S 



The Venn diagram for proving (3.9). 
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From the Venn diagram of Figure 3.6, we next observe that the event B may be expressed as 

B = SnB 

= (A u A c ) n B 
= (AnB)u(A c nB) 

That is, B is the union of two disjoint events: A n B and A L n B ; therefore, applying the 
additivity axiom to this second relation yields 

P[5] = P[AnB] + P[A c nS] 

Subtracting (3.11) from (3.10), canceling the common term P[A C n B ] and rearranging 
terms, (3.9) follows and Property 4 is proved. 

It is of interest to note that the joint probability P[AnS] accounts for that part of the 
sample space S where the events A and B coincide. If these two events are disjoint, then the 
joint probability P[An5] is zero, in which case (3.9) reduces to the additivity axiom of 
(3.2). 


When an experiment is performed and we only obtain partial information on the outcome 
of the experiment, we may reason about that particular outcome by invoking the notion of 
conditional probability. Stated the other way round, we may make the statement: 


To be specific, suppose we perform an experiment that involves a pair of events A and B. 
Let P[A|B] denote the probability of event A given that event B has occurred. The 
probability P[A|B] is called the conditional probability of A given B. Assuming that B has 
nonzero probability, the conditional probability P[A|B] is formally defined by 


P[A|B] = 


P[An£] 

P[5] 


where P[AnB] is the joint probability of events A and B, and P[B] is nonzero. 

For a fixed event B , the conditional probability P[A|B] is a legitimate probability law 
as it satisfies all three axioms of probability: 

Since by definition, P[A|Z?] is a probability, the nonnegativity axiom is clearly 
satisfied. 


Viewing the entire sample space S as event A and noting that S u B = B, we may 
use (3.12) to write 


P[S\B] = 


P[S|fl] = P[B] 
P[B] P[B] 


1 


Hence, the normalization axiom is also satisfied. 

Finally, to verify the additivity axiom, assume that A, and A 2 are two mutually 
exclusive events. We may then use (3.12) to write 


P[(Aj u A 2 ) n B] 
P[B] 


P[A 1 uA 2 |B] 
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Applying the distributive property to the numerator on the right-hand side, we have 


P[Aj uAAfi] 


P[(A, ufi)u (A 2 n B)] 
P[5] 


Next, recognizing that the two events AjnB and A n li are actually disjoint, we 
may apply the additivity axiom to write 


P[A 1 uA 2 |fi] 


PtAjnBj + PtA^nfi] 


P[B] 

P[Aj n B] P[A 2 n B] 
P[B] + P[S] 


which proves that the conditional probability also satisfies the additivity axiom. 

We therefore conclude that all three axioms of probability (and therefore all known 
properties of probability laws) are equally valid for the conditional probability P [A [ Z? ] . In 
a sense, this conditional probability captures the partial information that the occurrence of 
event B provides about event A; we may therefore view the conditional probability P [A | Z? ] 
as a probability law concentrated on event B. 


Suppose we are confronted with a situation where the conditional probability P[A|B] and 
the individual probabilities P[A] and Pf/i] are all easily determined directly, but the 
conditional probability P[B|A] is desired. To deal with this situation, we first rewrite 
(3.12) in the form 

P[A n B] = P[A|5]P[5] 

Clearly, we may equally write 

P[A n B] = P[B|A]P[A] 

The left-hand parts of these two relations are identical; we therefore have 

P[A|B]P[B] = P[5|A]P[A] 

Provided that P[A] is nonzero, we may determine the desired conditional probability 
P[5|A] by using the relation 

P[B|A] = P[A I*™ 

L 1 J P[A] 

This relation is known as Bayes’ rule. 

As simple as it looks, Bayes’ rule provides the correct language for describing 
inference , the formulation of which cannot be done without making assumptions. The 
following example illustrates an application of Bayes’ rule. 

Radar Detection 

Radar , a remote sensing system, operates by transmitting a sequence of pulses and has its 
receiver listen to echoes produced by a target (e.g., aircraft) that could be present in its 
surveillance area. 
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Let the events A and B be defined as follows: 

A = { a target is present in the area under surveillance } 

A e = {there is no target in the area} 

B = { the radar receiver detects a target } 

In the radar detection problem, there are three probabilities of particular interest: 

P[A] probability that a target is present in the area; this probability is called the 
prior probability. 

P[5|A] probability that the radar receiver detects a target, given that a target is 
actually present in the area; this second probability is called the probability 
of detection. 

P[5|A C ] probability that the radar receiver detects a target in the area, given that there 
is no target in the surveillance area; this third probability is called the 
probability of false alarm. 

Suppose these three probabilities have the following values: 

P[A] = 0.02 
P[B|A] =0.99 
P[Z? |A C ] = 0.01 

The problem is to calculate the conditional probability P[A|B] which defines the 
probability that a target is present in the surveillance area given that the radar receiver has 
made a target detection. 

Applying Bayes’ rule, we write 

P[A|B] = P[g l A]P[A] 

1 P[B] 

_ P[£[A]P[A] 

P[B|A]P[A] + P[B|A C ]P[A C ] 

= 0.99 x 0,02 

0.99 x0.02 + 0.01 x 0.98 

= 0.0198 
0.0296 

* 0.69 


Suppose that the occurrence of event A provides no information whatsoever about event B\ 
that is, 

P[B|A] = P[B] 

Then, (3.14) also teaches us that 

P[A|B] = P[A] 
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In this special case, we see that knowledge of the occurrence of either event, A or B, tells 
us no more about the probability of occurrence of the other event than we knew without 
that knowledge. Events A and B that satisfy this condition are said to be independent. 
From the definition of conditional probability given in (3.12), namely, 

P[A|fi] = 

1 P[B] 

we see that the condition P[A|5]= P[A] is equivalent to 

P[A n B] = P[A]P[fl] 

We therefore adopt this latter relation as the formal definition of independence. The 
important point to note here is that the definition still holds even if the probability P[/i| is 
zero, in which case the conditional probability P[A|B] is undefined. Moreover, the 
definition has a symmetric property, in light of which we can say the following: 


Random Variables 


It is customary, particularly when using the language of sample space pertaining to an 
experiment, to describe the outcome of the experiment by using one or more real-valued 
quantities or measurements that help us think in probabilistic terms. These quantities are 
called random variables, for which we offer the following definition: 


The following two examples illustrate the notion of a random variable embodied in this 
definition. 

Consider, for example, the sample space that represents the integers 1, 2, ..., 6, each 
one of which is the number of dots that shows uppermost when a die is thrown. Let the 
sample point k denote the event that k dots show in one throw of the die. The random 
variable used to describe the probabilistic event k in this experiment is said to be a discrete 
random variable. 

For an entirely different experiment, consider the noise being observed at the front end 
of a communication receiver. In this new situation, the random variable, representing the 
amplitude of the noise voltage at a particular instant of time, occupies a continuous range 
of values, both positive and negative. Accordingly, the random variable representing the 
noise amplitude is said to be a continuous random variable. 

The concept of a continuous random variable is illustrated in Figure 3.7, which is a 
modified version of Figure 3.4. Specifically, for the sake of clarity, we have suppressed the 
events but show subsets of the sample space S being mapped directly to a subset of a real 
line representing the random variable. The notion of the random variable depicted in 
Figure 3.7 applies in exactly the same manner as it applies to the underlying events. The 
benefit of random variables, pictured in Figure 3.7, is that probability analysis can now be 
developed in terms of real-valued quantities, regardless of the form or shape of the 
underlying events of the random experiment under study. 
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Illustration of the relationship between sample 
space, random variables, and probability. 


One last comment is in order before we proceed further. Throughout the whole book, 
we will be using the following notation: 


Distribution Functions 


To proceed with the probability analysis in mathematical terms, we need a probabilistic 
description of random variables that works equally well for discrete and continuous 
random variables. Let us consider the random variable X and the probability of the event 
X< x. We denote this probability by P[X < x], It is apparent that this probability is a 
function of the dummy variable x. To simplify the notation, we write 

F x (x) = P[X<x] for all x 

The function F x (x) is called the cumulative distribution function or simply the distribution 
function of the random variable X. Note that F x (x) is a function of .r, not of the random 
variable X. For any point x in the sample space, the distribution function F x (x) expresses 
the probability of an event. 

The distribution function F x (x), applicable to both continuous and discrete random 
variables, has two fundamental properties: 

Boundedness of the Distribution 

The distribution function F x (x) is a bounded function of the dummy variable x that lies 
between zero and one. 

Specifically, F x (x) tends to zero as x tends to -oo , and it tends to one as x tends to co . 
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Monotonicity of the Distribution 

The distribution function F x (x) is a monotone nondecreasing function of x. 

In mathematical terms, we write 

F x (xf) < F x {xf) for X| < x 2 

Both of these properties follow directly from (3.15). 

The random variable X is said to be continuous if the distribution function F x (x) is 
differentiable with respect to the dummy variable x everywhere, as shown by 

f x M = f[ F X (x '> for all x 

The new function f x (x) is called the probability density function of the random variable X. 
The name, density function, arises from the fact that the probability of the event x t < X < x 2 is 

P[x l <X<x 2 ] = P[X<x 2 ]-P[Z<Xj] 

= f f x ( x ) dx 

x i 

The probability of an interval is therefore the area under the probability density function in 
that interval. Putting Xj = -oo in (3.17) and changing the notation somewhat, we readily 
see that the distribution function is defined in terms of the probability density function as 

F x ( x ) = f U&dt 

—00 

where £, is a dummy variable. Since F x ( oo) = 1, corresponding to the probability of a 
sure event, and F x (- oo) = 0, corresponding to the probability of an impossible event, we 
readily find from (3.17) that 

J /xW dx = 1 

—00 

Earlier we mentioned that a distribution function must always be a monotone 
nondecreasing function of its argument. It follows, therefore, that the probability density 
function must always be nonnegative. Accordingly, we may now formally make the 
statement: 


Nonnegativity 

The probability density function f x (x) is a nonnegative function of the sample value x of 
the random variable X. 

Normalization 

The total area under the graph of the probability density function f x (x) is equal to unity. 
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An important point that should be stressed here is that the probability density function 
fx(x) contains all the conceivable information needed for statistical characterization of the 
random variable X. 


Uniform Distribution 

To illustrate the properties of the distribution function F x (x) and the probability density 
function f x (x) for a continuous random variable, consider a uniformly distributed random 
variable, described by 


f x O) 


0, x<a 

- — , a <x< b 

b - a 

0, x>b 


Integrating f x (x) with respect to x yields the associated distribution function 


F x (x) 


0 , 

x - a 


x < a 
a < x < b 


b - a' 

0, x > b 


Plots of these two functions versus the dummy variable x are shown in Figure 3.E 



0 a 


(a) 

Uniform distribution. 



Consider next the case of a discrete random variable, X, which is a real-valued function of 
the outcome of a probabilistic experiment that can take a finite or countably infinite 
number of values. As mentioned previously, the distribution function F x (x) defined in 
(3.15) also applies to discrete random variables. However, unlike a continuous random 
variable, the distribution function of a discrete random variable is not differentiable with 
respect to its dummy variable x. 


Distribution Functions 


101 


To get around this mathematical difficulty, we introduce the notion of the probability 
mass function as another way of characterizing discrete random variables. Let X denote a 
discrete random variable and let x be any possible value of X taken from a set of real 
numbers. We may then make the statement: 


Stated in mathematical terms, we write 

p x (x) = P[X = x] 
which is illustrated in the next example. 


The Bernoulli Random Variable 

Consider a probabilistic experiment involving the discrete random variable X that takes 
one of two possible values: 

• the value 1 with probability p\ 

• the value 0 with probability 1 - p. 

Such a random variable is called the Bernoulli random variable, the probability mass 
function of which is defined by 


P x ( x ) = 


1 -p x = 0 
p , x = 1 
0, otherwise 


This probability mass function is illustrated in Figure 3.9. The two delta functions, each of 
weight 1/2, depicted in Figure 3.9 represent the probability mass function at each of the 
sample points x = 0 and x = 1 . 


Probability — — " 5 

mass 

function 

PUr=*] 


X 

0 1 


Illustrating the probability mass 
function for a fair coin-tossing experiment. 


From here on, we will, largely but not exclusively, focus on the characterization of 
continuous random variables. A parallel development and similar concepts are possible for 
discrete random variables as well. 
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Thus far we have focused attention on situations involving a single random variable. 
However, we frequently find that the outcome of an experiment requires several random 
variables for its description. In what follows, we consider situations involving two random 
variables. The probabilistic description developed in this way may be readily extended to 
any number of random variables. 

Consider two random variables X and Y. In this new situation, we say: 


The variables X and Y may be two separate one-dimensional random variables or the 
components of a single two-dimensional random vector. In either case, the joint sample 
space is the xy-plane. The joint distribution function F x Y (x,y) is the probability that the 
outcome of an experiment will result in a sample point lying inside the quadrant 
(-00 < X < x, -oo <Y<y ) of the joint sample space. That is. 


F x Y (x, y) = P[X<x,Y<y] 


Suppose that the joint distribution function F XY (x,y) is continuous everywhere and that the 
second-order partial derivative 


fx, y( x > y) - 


d 2 F x , y(-T y) 
dxdy 


exists and is continuous everywhere too. We call the new function f XY (x,y) the joint 
probability density function of the random variables X and Y. The joint distribution 
function F X Y (x,y ) is a monotone nondecreasing function of both x and y. Therefore, from 
(3.25) it follows that the joint probability density function f x Y (x,y) is always nonnegative. 
Also, the total volume under the graph of a joint probability density function must be 
unity, as shown by the double integral 


CO CO 

J J fx, y( X ’ y) iv d - v = 


The so-called marginal probability density functions, f x (x) and f Y (y), are obtained by 
differentiating the corresponding marginal distribution functions 

F x( x ) = fx, y( x > 00 ) 


and 

F y(y) = fx, y(°°> y) 

with respect to the dummy variables x and v, respectively. We thus write 


d 
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Similarly, we write 


f Y (y) = J f x , y( x x ) dY 


In words, the first marginal probability density function f x {x), defined in (3.27), is 
obtained from the joint probability density function/^ y(x,y) by simply integrating it over 
all possible values of the undesired random variable Y. Similarly, the second marginal 
probability density function fy(y), defined in (3.28), is obtained from f X y(x,y) by 
integrating it over all possible values of the undesired random variable; this time, the 
undesirable random variable is X. Henceforth, we refer to f x {x) and f Y (y), obtained in the 
manner described herein, as the marginal densities of the random variables X and K whose 
joint probability density function is f X jix,y). Here again, we conclude the discussion on a 
pair of random variables with the following statement: 


This statement can be generalized to cover the joint probability density function of many 
random variables. 


Suppose that X and Y are two continuous random variables with their joint probability 
density function/^ Y (x,y). The conditional probability density function of Y, such that 
X = x, is defined by 


fyiy\x) = 


f X j(- x ’ y) 
f x ( x ) 


provided that _/^(x) > 0, where f x (x) is the marginal density of X\f Y (y\x) is a shortened 
version of/yi x (y|x), both of which are used interchangeably. The function f Y (y\x) may be 
thought of as a function of the variable Y, with the variable x arbitrary but fixed; 
accordingly, it satisfies all the requirements of an ordinary probability density function for 
any x, as shown by 

f Y (y\x)>o 

and 

f f Y (y\x)dy = 1 


Cross-multiplying terms in (3.29) yields 

fxji^y) = fy(y\ x )fx (x ) 

which is referred to as the multiplication rule. 

Suppose that knowledge of the outcome of X can, in no way, affect the distribution of Y. 
Then, the conditional probability density function ,/y(v|x) reduces to the marginal density 
fy(y), as shown by 


f Y (y\x) = f Y (y) 
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In such a case, we may express the joint probability density function of the random 
variables X and Y as the product of their respective marginal densities; that is, 


On the basis of this relation, we may now make the following statement on the 
independence of random variables: 


Let X and Y be two continuous random variables that are statistically independent; their 
respective probability density functions are denoted by f x (x) and/yTy). Define the sum 


The issue of interest is to find the probability density function of the new random variable 
Z, which is denoted by f z (z). 

To proceed with this evaluation, we first use probabilistic arguments to write 
P[Z<z\X = x] = P[X + Y<z\X = x] 

= P[x+Y<z\X = x ] 

where, in the second line, the given value x is used for the random variable X. Since X and 
Y are statistically independent, we may simplify matters by writing 

P[Z<z\X = x] = P[jr+T<z] 

= P[y<z-jc] 

Equivalently, in terms of the pertinent distribution functions, we may write 

F z (z\x) = Fy(z-x ) 

Hence, differentiating both sides of this equation, we get the corresponding probability 
density functions 


Next, adapting the definition of the marginal density given in (3.27) to the problem at 
hand, we write 


Finally, substituting (3.31) into (3.32), we find that the desired f z (z) is equal to the 
convolution of f x (x) and /y(y), as shown by 


f x ,Y (*> >0 = fx^f Y (y) 


Z = X+Y 


f z (z\x) =f Y (z-x ) 

Using the multiplication rule described in (3.30), we have 


fz.xtZ’X) = fy( z - -O/xO) 



QO 


—GO 
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In words, we may therefore state: 


Note, however, that no assumptions were made in arriving at this statement except for the 
random variables X and Y being continuous random variables. 

The Concept of Expectation 


As pointed out earlier, the probability density function fx(x) provides a complete statistical 
description of a continuous random variable X. However, in many instances, we find that 
this description includes more detail than is deemed to be essential for practical 
applications. In situations of this kind, simple statistical averages are usually considered 
to be adequate for the statistical characterization of the random variable X. 

In this section, we focus attention on the first-order statistical average, called the 
expected value or mean of a random variable; second-order statistical averages are studied 
in the next section. The rationale for focusing attention on the mean of a random variable 
is its practical importance in statistical terms, as explained next. 


The expected value or mean of a continuous random variable X is formally defined by 


where E denotes the expectation or averaging operator. According to this definition, the 
expectation operator E, applied to a continuous random variable x, produces a single 
number that is derived uniquely from the probability density function /^(x). 

To describe the meaning of the defining equation (3.34), we may say the following: 


To elaborate on this statement, we write the integral in (3.34) as the limit of an 
approximating sum formulated as follows. Let {x^k = 0, ±1, ±2, ...} denote a set of 
uniformly spaced points on the real line 


where A is the spacing between adjacent points on the line. We may thus rewrite (3.34) in 
the form of a limit as follows: 



OO 


—00 




E[A] = lim 'V f x,/ y (x)dx 


co (Ar+l ) A 


A -> 0 A 

k = -oo 


lim 
A — » 0 



k = -oo 
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For a physical interpretation of the sum in the second line of the right-hand side of this 
equation, suppose that we make n independent observations of the random variable X. Let 
N n (k) denote the number of times that the random variable X falls inside the Mi bin, 
defined by 


Arguing heuristically, we may say that, as the number of observations n is made large, the 
ratio N n (k)/n approaches the probability P[^ - A/2 < X < x k + A/2], Accordingly, we may 
approximate the expected value of the random variable X as 


We now recognize the quantity on the right-hand side of (3.36) simply as the “sample 
average.” The sum is taken over all the values x k , each of which is weighted by the number 
of times it occurs; the sum is then divided by the total number of observations to give the 
sample average. Indeed, (3.36) provides the basis for computing the expectation E[X). 

In a loose sense, we may say that the discretization, introduced in (3.35), has changed 
the expectation of a continuous random variable to the sample averaging over a discrete 
random variable. Indeed, in light of (3.36), we may formally define the expectation of a 
discrete random variable X as 


where px(x) is the probability mass function of X, defined in (3.22), and where the 
summation extends over all possible discrete values of the dummy variable x. Comparing 
the summation in (3.37) with that of (3.36), we see that, roughly speaking, the ratio N n (x)/n 
plays a role similar to that of the probability mass function Px(x), which is intuitively 
satisfying. 

Just as in the case of a continuous random variable, here again we see from the defining 
equation (3.37) that the expectation operator E, applied to a discrete random variable X, 
produces a single number derived uniquely from the probability mass function Px(x). 

Simply put, the expectation operator E applies equally well to discrete and continuous 
random variables. 


The expectation operator E plays a dominant role in the statistical analysis of random 
variables (as well as random processes studied in Chapter 4). It is therefore befitting that 
we study two important properties of this operation in this section; other properties are 
addressed in the end-of-chapter Problem 3.13. 




k=- oo 


- ^ x k N n (k), for large n 


E[AT] = J^xp x (x) 


X 
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Linearity 

Consider a random variable Z, defined by 

Z = X+Y 

where X and Y are two continuous random variables whose probability density functions 
are respectively denoted by f x (x) and fy(y). Extending the definition of expectation 
introduced in (3.34) to the random variable Z, we write 


E[Z] = \ zf z (z) dz 


where fz(z) is defined by the convolution integral of (3.33). Accordingly, we may go on to 
express the expectation E[Z] as the double integral 


E[Z] = J J zf x (x)f Y (z-x)dxdz 

— 00 — 00 

CO CO 

= 1 I zf x Y (x, z - x) dr d; 


where the joint probability density function 

fx, Y (x ’ * - •*) = Mx)fy(z ~ x ) 
Making the one-to-one change of variables 


y = z-x 


and 


we may now express the expectation E[Z] in the expanded form 

CO CO 

E[Z] = j J (x + y)f X ' Y (x, y ) dr dy 

— 00 — 00 

00 00 00 00 

= J J xf XY {x,y)dxdy + \ J yf XY (x,y)dxdy 

—00 " —00 —00 * —00 

Next, we recall from (3.27) that the first marginal density of the random variable X is 

fx(x) = j f x , A x ’ y) ^ 

—GO 

and, similarly, for the second marginal density 

fy(y) = J f X , y(x, y ) dx 

—00 

The formula for the expectation E[Z] is therefore simplified as follows: 
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We may extend this result to the sum of many random variables by the method of induction 
and thus write that, in general. 


n 


E 




E E [*,-]) 

;=i 


In words, we may therefore state: 


This statement proves the linearity property of the expectation operator, which makes this 
operator all the more appealing. 


Statistical Independence 

Consider next the random variable Z, defined as the product of two independent random 
variables X and Y, whose probability density functions are respectively denoted by fx(x) 
and f Y iy)- As before, the expectation of Z is defined by 

E[Z] = f zf z (z) dz 

—00 

except that, this time, we have 

f z (z) = f x , y( x ’ y) 

= f x ( x )f Y (y) 

where, in the second line, we used the statistical independence of X and Y. With Z = XY, we 
may therefore recast the expectation E[Z] as 


E[ZT] = f 


x yf x ( x )f Y (y) Ay d >’ 


J x/ x (x) d.vj yfyiy) dy 


= EfflE[f] 

In words, we may therefore state: 


Here again, by induction, we may extend this statement to the product of many 
independent random variables. 

Second-Order Statistical Averages 


In the previous section we studied the mean of random variables in some detail. In this 
section, we expand on the mean by studying different second-order statistical averages. 


Second-Order Statistical Averages 
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These statistical averages, together with the mean, complete the partial characterization 
of random variables. 

To this end, let X denote a random variable and let g(X) denote a real-valued function of 
X defined on the real line. The quantity obtained by letting the argument of the function 
g(X) be a random variable is also a random variable, which we denote as 

y = gw 

To find the expectation of the random variable Y, we could, of course, find the probability 
density function fy(y) and then apply the standard formula 


Em = f >'//>■) dy 

—00 

A simpler procedure, however, is to write 


E[gP 0] = f 

—00 


g(x)f x (x) dx 


Equation (3.41) is called the expected value rule ; validity of this rule for a continuous 
random variable is addressed in Problem 3.14. 


The Cosine Transformation of a Random Variable 

Let 

Y = g(X) = cos(X) 

where X is a random variable uniformly distributed in the interval that is. 


fy(x) = ‘ 


1 


-7t <x< n 


2n' 

0, otherwise 


According to (3.41), the expected value of Y is 

E[T] = J (cosx)^ j dv 


1 . 71 

sill A 

271 ' x =~ n 


= 0 

This result is intuitively satisfying in light of what we know about the dependence of a 
cosine function on its argument. 


For the special case of g(X) = X n , the application of (3.41) leads to the nth moment of the 
probability distribution of a random variable X; that is, 

-CO 

E[x”] = j x n f x (x) dx 

—00 


110 


Probability Theory and Bayesian Inference 


From an engineering perspective, however, the most important moments of X are the first 
two moments. Putting n = 1 in (3.42) gives the mean of the random variable, which was 
discussed in Section 3.6. Putting n = 2 gives the mean-square value of X, defined by 

E[X 2 ] = J°° x 2 f x (x) dx 


We may also define central moments, which are simply the moments of the difference 
between a random variable X and its mean ju x . Thus, the nth central moment of X is 

.00 

E[(X-// x )"] = J (x-ju x ff x (x) dx 

—00 

For n = 1, the central moment is, of course, zero. For n = 2, the second central moment is 
referred to as the variance of the random variable X, defined by 

var[X] = UX-fif 

f°° 2 

= J (x- M x ) fx (x ) dx 

—00 

2 

The variance of a random variable X is commonly denoted by a x . The square root of the 

variance, namely a x , is called the standard deviation of the random variable X. 

In a sense, the variance cf v of the random variable X is a measure of the variable’s 
x 2 

“randomness” or “volatility.” By specifying the variance a x we essentially constrain the 
effective width of the probability density function fx(x) of the random variable X about the 
mean jU x . A precise statement of this constraint is contained in the Chebyshev inequality, 
which states that for any positive number s , we have the probability 

2 

n\x-n^> £ ]<^ 

s~ 

From this inequality we see that the mean and variance of a random variable provide a 
weak description of its probability distribution; hence the practical importance of these 
two statistical averages. 

Using (3.43) and (3.45), we find that the variance a~ x and the mean-square value E[X 2 ] 
are related by 

4 = E[A 2 -2// x A + 4] 

= E[A 2 ]-2// x E[X]+4 
= E[A 2 ]-4 

where, in the second line, we used the linearity property of the statistical expectation 
operator E. Equation (3.47) shows that if the mean /t x is zero, then the variance oy and 
the mean-square value E [X 2 ] of the random variable X are equal. 
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Thus far, we have considered the characterization of a single random variable. Consider 
next a pair of random variables X and Y. In this new setting, a set of statistical averages of 
importance is the joint moments , namely the expectation of X‘Y k , where i and k may 
assume any positive integer values. Specifically, by definition, we have 

i k r 00 r°° i k 

E[X Y ] = J J x y f X Y (x,y)dxdy 

—CO —CO 

A joint moment of particular importance is the correlation, defined by E[AF], which 
corresponds to i = k = 1 in this equation. 

More specifically, the correlation of the centered random variables ( X - E[X]) and 
(L - E[F]), that is, the joint moment 

cov[XL] = E[(X-E[X])(F-E[L])] 

is called the covariance of X and Y. Let p x = E[X] and ju Y = E[ K|; we may then expand 
(3.49) to obtain the result 

cov[XL] = E[XL] - jU xJ u Y 

2 

where we have made use of the linearity property of the expectation operator E. Let of 
and <j~y denote the variances of X and Y, respectively. Then, the covariance of X and K 
normalized with respect to the product o x o Y , is called the correlation coefficient ofX and 
Y, expressed as 

p(X, Y) = 

a X a Y 

The two random variables X and Y are said to be uncorrelated if, and only if, their 
covariance is zero; that is, 

cov[XL] = 0 

They are said to be orthogonal if and only if their correlation is zero; that is, 

E[XF] = 0 

In light of (3.50), we may therefore make the following statement: 


Characteristic Function 


In the preceding section we showed that, given a continuous random variable X, we can 
formulate the probability law defining the expectation of X n (i.e., nth moment of X) in 
terms of the probability density function f x (x), as shown in (3.42). We now introduce 
another way of formulating this probability law; we do so through the characteristic 
function. 


112 


Probability Theory and Bayesian Inference 


For a formal definition of this new concept, we say: 


According to the second expression on the right-hand side of (3.52), we may also view the 
characteristic function d> x ( v) of the random variable X as the Fourier transform of the 
associated probability density function fx(x), except for a sign change in the exponent. In 
this interpretation of the characteristic function we have used exp(jrx) rather than 
expf-ji/jc) so as to conform with the convention adopted in probability theory. 

Recognizing that v and x play roles analogous to the variables 2n/and t respectively in 
the Fourier-transform theory, we may appeal to the Fourier transform theory of Chapter 2 
to recover the probability density function f x (x) of the random variable X given the 
characteristic function . Specifically, we may use the inversion formula to write 


Thus, with fxif) and <t > x (f) forming a Fourier-transform pair, we may obtain the moments 
of the random variable X from the function <1 ) x if). To pursue this issue, we differentiate 
both sides of (3.52) with respect to v a total of n times, and then set v= 0; we thus get the 
result 


The integral on the right-hand side of this relation is recognized as the nth moment of the 
random variable X. Accordingly, we may recast (3.54) in the equivalent form 


This equation is a mathematical statement of the so-called moment theorem. Indeed, it is 
because of (3.55) that the characteristic function d> x ( v) is also referred to as a moment- 
generating function. 

Exponential Distribution 

The exponential distribution is defined by 






otherwise 
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where A is the only parameter of the distribution. The characteristic function of the 
distribution is therefore 


d> ( v ) = A exp (-Ax) exp(j vx) dx 

J o 

_ A 
A-jv 

We wish to use this result to find the mean of the exponentially distributed random 
variable X. To do this evaluation, we differentiate the characteristic function <1> ( v) with 
respect to v once, obtaining 

= —^— 2 
(^-j v) 

where the prime in d>' x (v) signifies first-order differentiation with respect to the 
argument v. Hence, applying the moment theorem of (3.55), we get the desired result 


E[X] = -jO' x (v) 


\v= 0 


1 

A 


The Gaussian Distribution 


Among the many distributions studied in the literature on probability theory, the Gaussian 
distribution stands out, by far, as the most commonly used distribution in the statistical 
analysis of communications systems, for reasons that will become apparent in Section 
3.10. Let X denote a continuous random variable; the variable X is said to be Gaussian 
distributed if its probability density function has the general form 

/xW = 7^ ex P 

J2ncr 


(x-M) 
„ 2 


where ju and a are two scalar parameters that characterize the distribution. The parameter 
H can assume both positive and negative values (including zero), whereas the parameter a 
is always positive. Under these two conditions, the fx(x) of (3.58) satisfies all the 
properties of a probability density function, including the normalization property; namely, 


1 

J2na 



(■ x-M ) 
2 a 


dv 


1 


A Gaussian random variable has many important properties, four of which are 
summarized on the next two pages. 
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Mean and Variance 

In the defining (3.58), the parameter // is the mean of the Gaussian random variable X and 
o~ is its variance. We may therefore state: 


Linear Function of a Gaussian Random Variable 

2 

Let X be a Gaussian random variable with mean p and variance a . Define a new random 
variable 

Y = aX+b 

where a and b are scalars and a # (). Then Y is also Gaussian with mean 


and variance 
In words, we may state: 


E[T] = ci/u + b 
var[F] = a" a" 


Sum of Independent Gaussian Random Variables 


Let X and Y be independent Gaussian random variables with means p x and /j y , 
respectively, and variances ay and ay , respectively. Define a new random variable 

Z = X+Y 

The random variable Z is also Gaussian with mean 


and variance 


E[Z] — + f2y 

2 2 

var[Z] = oy+oy 


In general, we may therefore state: 


Jointly Gaussian Random Variables 

Let X and Y be a pair of jointly Gaussian random variables with zero means and variances 
2 2 

ay and ay, respectively. The joint probability density function of X and Y is completely 
determined by ay, cry. and p, where p is the correlation coefficient defined in (3.51). 
Specifically, we have 

fx, y( x > y) = cexp(-q(x, y)) 
where the normalization constant c is defined by 

1 

C = —zzzz 

2 71 xj 1 p 
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and the exponential term is defined by 


q(x,y) = 


1 


f 2 


2(1 -p) 


+ y- 


x ~ xy 
— -2 

v CT - a x a Y ^ 


In the special case where the correlation coefficient p is zero, the joint probability density 
function of X and Y assumes the simple form 


fx, y( x ’ . v ) _ 


1 

~ exp 

Z71 0'^O’y 


f 





2 'N 

y 

2 ay 


= f x ( x )fY ( y) 


Accordingly, we may make the statement: 


By virtue of Gaussianity, this statement is stronger than the last statement made at the end 
of the subsection on covariance. 


2 

In light of Property 1, the notation ,N(jU,cr~) is commonly used as the shorthand 
description of a Gaussian distribution parameterized in terms of its mean p and variance 
cf . The symbol A" is used in recognition of the fact that the Gaussian distribution is also 
referred to as the normal distribution, particularly in the mathematics literature. 


2 

When // = 0 and <7=1, the probability density function of (3.58) reduces to the special 
form: 


fxix) , y.y-1) 

A Gaussian random variable X so described is said to be in its standard form. 
Correspondingly, the distribution function of the standard Gaussian random variable is 
defined by 



Owing to the frequent use of integrals of the type described in (3.67), several related 
functions have been defined and tabulated in the literature. The related function commonly 
used in the context of communication systems is the Q-f unction, which is formally defined as 

Q(x) = 1 -F x (x) 


1 


J2n 
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In words, we may describe the Q-f unction as follows: 


Unfortunately, the integral of (3.67) defining the standard Gaussian distribution F x (x) does 
not have a closed-form solution. Rather, with accuracy being an issue of importance, F x (x ) is 
usually presented in the form of a table for varying x. Table 3.1 is one such recording. To 
utilize this table for calculating the ((-function, we build on two defining equations: 

For nonnegative values of x, the first line of (3.68) is used. 

For negative values of x, use is made of the symmetric property of the ((-function: 
Q(-x) = 1 - Q(x) 


To visualize the graphical formats of the commonly used standard Gaussian functions, 
F x (x),f x (x), and Q(x), three plots are presented at the bottom of this page: 

Figure 3.10a plots the distribution function, F x (x), defined in (3.67). 

Figure 3.10b plots the density function, f x (x), defined in (3.66). 

Figure 3.11 plots the ((-function defined in (3.68). 



(a) 



(b) 

The normalized Gaussian (a) distribution 
function and (b) probability density function. 



X 


The 2-function. 
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The standard Gaussian distribution (O-function) table 




0.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

0.1 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

0.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

0.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

.6443 

.6460 

.6517 

0.4 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

0.5 

.6915 

.6950 

.6985 

.7019 

.7054 

.7088 

.7123 

.7157 

.7190 

.7224 

0.6 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7485 

.7517 

.7549 

0.7 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

0.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

0.9 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.1 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.2 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

.9015 

1.3 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

.9149 

.9162 

.9177 

1.4 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

.9418 

.9429 

.9441 

1.6 

.9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.7 

.9554 

.9564 

.9573 

.9582 

.9591 

.9599 

.9608 

.9616 

.9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

.9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.4 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

.9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

.9986 

3.0 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.1 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.9992 

.9993 

.9993 

3.2 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.4 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 


1. The entries in this table, x say, occupy the range [0.0, 3.49]; the x is sample value of the random variable X. 

2. For each value of x, the table provides the corresponding value of the Q-function: 
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The Central Limit Theorem 


The central limit theorem occupies an important place in probability theory: it provides the 

mathematical justification for using the Gaussian distribution as a model for an observed 

random variable that is known to be the result of a large number of random events. 

For a formal statement of the central limit theorem, let Xj, X 2 , X n denote a sequence of 

independently and identically distributed (iid) random variables with common mean u and 
2 

variance <7 . Define the related random variable 


F = 


ajn 


Y J X i~ n V 

V' = 1 


The subtraction of the product term n/j from the sum ^ X i ensures that the random 

r i = l 

variable Y n has zero mean; the division by the factor <rV« ensures that Y n has unit variance. 


Given the setting described in (3.70), the central limit theorem formally states: 


To appreciate the practical importance of the central limit theorem, suppose that we have a 
physical phenomenon whose occurrence is attributed to a large number of random events. 
The theorem, embodying (3.67)— (3.7 1), permits us to calculate certain probabilities 
simply by referring to a 0-function table (e.g.. Table 3.1). Moreover, to perform the 
calculation, all that we need to know are means and variances. 

However, a word of caution is in order here. The central limit theorem gives only the 
“limiting” form of the probability distribution of the standardized random variable Y n as n 
approaches infinity. When n is finite, it is sometimes found that the Gaussian limit 
provides a relatively poor approximation for the actual probability distribution of F„, even 
though n may be large. 

Sum of Uniformly Distributed Random Variables 

Consider the random variable 


Y n = 


X 


i = 1 


X.. 
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where the X t are independent and uniformly distributed random variables on the interval 
from -1 to +1. Suppose that we generate 20000 samples of the random variable Y n for 
n = 10, and then compute the probability density function of Y n by forming a histogram of 
the results. Figure 3.11a compares the computed histogram (scaled for unit area) with the 
probability density function of a Gaussian random variable with the same mean and 
variance. The figure clearly illustrates that in this particular example the number of 
independent distributions n does not have to be large for the sum Y n to closely 
approximate a Gaussian distribution. Indeed, the results of this example confirm how 
powerful the central limit theorem is. Moreover, the results explain why Gaussian models 
are so ubiquitous in the analysis of random signals not only in the study of communication 
systems, but also in so many other disciplines. 



X 


Simulation supporting validity of the central limit theorem. 


Bayesian Inference 


The material covered up to this point in the chapter has largely addressed issues involved 
in the mathematical description of probabilistic models. In the remaining part of the 
chapter we will study the role of probability theory in probabilistic reasoning based on the 
Bayesian paradigm, which occupies a central place in statistical communication theory. 

To proceed with the discussion, consider Figure 3.12, which depicts two finite- 
dimensional spaces: a parameter space and an observation space , with the parameter 
space being hidden from the observer. A parameter vector 0, drawn from the parameter 
space, is mapped probabilistically onto the observation space, producing the observation 
vector x. The vector x is the sample value of a random vector X, which provides the 
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Observation space 

Probabilistic model for Bayesian inference. 


observer information about 0. Given the probabilistic scenario depicted in Figure 3.12, we 
may identify two different operations that are the dual of each other. 

Probabilistic modeling. The aim of this operation is to formulate the conditional 
probability density function / x q(x|0), which provides an adequate description of 
the underlying physical behavior of the observation space. 

Statistical analysis. The aim of this second operation is the inverse of probabilistic 
modeling, for which we need the conditional probability density function 
/o|x(9| x )- 

In a fundamental sense, statistical analysis is more profound than probabilistic modeling. 
We may justify this assertion by viewing the unknown parameter vector 0 as the cause for 
the physical behavior of the observation space and viewing the observation vector x as the 
effect. In essence, statistical analysis solves an inverse problem by retrieving the causes 
(i.e., the parameter vector 0) from the effects (i.e., the observation vector x). Indeed, we 
may go on to say that whereas probabilistic modeling helps us to characterize the future 
behavior of x conditional on 0, statistical analysis permits us to make inference about 0 
given x. 

To formulate the conditional probability density function of / X |@(x|0), we recast 
Bayes’ theorem of (3.14) in its continuous version, as shown by 


~f©|x(8| x ) 


/xieW/eW 

TxW 


The denominator is itself defined in terms of the numerator as 


/xW = J/x,©( x |0)/©(9)d0 

j 0 1 

= f / x @ (x, 0) d0 
J © ’ 
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which is the marginal density of X, obtained by integrating out the dependence of the joint 
probability density function /x|©( x |0) . In words, / X (x) is a marginal density of the joint 
probability density function / x @ (x, 0). The inversion formula of (3.72) is sometimes 
referred to as the principle of inverse probability. 

In light of this principle, we may now introduce four notions: 

Observation density. This stands for the conditional probability density function 
/ X | 0 ( x | 0 ), referring to the “observation” vector x given the parameter vector 0 . 
Prior. This stands for the probability density function /©( 0), referring to the 
parameter vector 0 “prior” to receiving the observation vector x. 

Posterior. This stands for the conditional probability density function / 0 | X (0|x), 
referring to the parameter vector 0 “after” receiving the observation vector x. 
Evidence. This stands for the probability density function / X (x) , referring to the 
“information” contained in the observation vector X for statistical analysis. 

The posterior / @ | x (0 1 x) is central to Bayesian inference. In particular, we may view it as 
the updating of information available on the parameter vector 0 in light of the information 
contained in the observation vector x, while the prior /©( 0 ) is the information available on 
0 prior to receiving the observation vector x. 


The inversion aspect of statistics manifests itself in the notion of the likelihood function. 
In a formal sense, the likelihood, denoted by Z(0|x), is just the observation density 
/'x q(x|0) reformulated in a different order, as shown by 

|x) = / X | @ (x| 0 ) 

The important point to note here is that the likelihood and the observation density are both 
governed by exactly the same function that involves the parameter vector 0 and the obser- 
vation vector x. There is, however, a difference in interpretation: the likelihood function 
Z(0|x) is treated as a function of the parameter vector 0 given x, whereas the observation 
density / X |©(x| 0 ) is treated as a function of the observation vector x given 0 . 

Note, however, unlike /x|©( x |0)> the likelihood /(0|x) is not a distribution; rather, it is 
a function of the parameter vector 0 , given x. 

In light of the terminologies introduced, namely the posterior, prior, likelihood, and 
evidence, we may now express Bayes’ rule of (3.72) in words as follows: 

. ■ likelihood x prior 

posterior = 

evidence 


For convenience of presentation, let 

40 ) = /®( 0 ) 

Then, recognizing that the evidence defined in (3.73) plays merely the role of a 
normalizing function that is independent of 0, we may now sum up (3.72) on the principle 
of inverse probability succinctly as follows: 
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To elaborate on the significance of the defining equation (3.74), consider the likelihood 
functions /(0|xj) and /(0|x 2 ) on parameter vector 0. If, for a prescribed prior /t(0). these 
two likelihood functions are scaled versions of each other, then the corresponding 
posterior densities of 0 are essentially identical, the validity of which is a straightforward 
consequence of Bayes’ theorem. In light of this result we may now formulate the so-called 
likelihood principle as follows: 

If Xj and x 0 are two observation vectors depending on an unknown parameter vector 0, 
such that 

/(0|x 1 ) = c/(0|x 2 ) for all 0 

where c is a scaling factor, then these two observation vectors lead to an identical 
inference on 0 for any prescribed prior /©(0). 


Consider a model, parameterized by the vector 0 and given the observation vector x. In 
statistical terms, the model is described by the posterior density /© | x(0 1 x )- In this context, 
we may now introduce a function t(x), which is said to be a sufficient statistic if the 
probability density function of the parameter vector 0 given t(x) satisfies the condition 

/©|x(®l x ) = ■/©|T(x)(®l t ( x )) 

This condition imposed on t(x), for it to be a sufficient statistic, appears intuitively 
appealing, as evidenced by the following statement: 


We may thus view the notion of sufficient statistic as a tool for “data reduction,” the use of 
which results in considerable simplification in analysis. The data reduction power of the 
sufficient statistic t(x) is well illustrated in Example 7. 

Parameter Estimation 


As pointed out previously, the posterior density /©|x(®l x ) i s central to the formulation of 
a Bayesian probabilistic model, where 0 is an unknown parameter vector and x is the 
observation vector. It is logical, therefore, that we use this conditional probability density 
function for parameter estimation. Accordingly, we define the maximum a posteriori 
(MAP) estimate of 0 as 

©map = arg max / q |X (0|x) 

0 1 

= arg max Z(0 1 x) zr( 0 ) 

0 

where /(0]x) is the likelihood function defined in (3.74), and /t(0) is the prior defined in 
(3.75). To compute the estimate ©map , we require availability of the prior 
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In words, the right-hand side of (3.77) reads as follows: 


Generalizing the statement made at the end of the discussion on multiple random variables 
in Section 3.5, we may now go on to say that, for the problem at hand, the conditional 
probability density function ./<=> x ( ® I x ) contains a U the conceivable information about the 
multidimensional parameter vector 0 given the observation vector x. The recognition of 
this fact leads us to make the follow-up important statement, illustrated in Figure 3.13 for 
the simple case of a one-dimensional parameter vector: 


In referring to 0map as the MAP estimate, we have made a slight change in our 
terminology: we have, in effect, referred to /@|x(0| x ) as the a posteriori density rather 
than the posterior density of 0. We have made this minor change so as to conform to the 
MAP terminology that is well and truly embedded in the literature on statistical 
communication theory. 

In another approach to parameter estimation, known as maximum likelihood estimation, 
the parameter vector 0 is estimated using the formula 

9ml = arg SU p/(0|x) 

0 

That is, the maximum likelihood estimate 0ml is that value of the parameter vector 0 that 
maximizes the conditional distribution /x|0( x |0) at the observation vector x. Note that 
this second estimate ignores the prior 7t(Q) and, therefore, lies at the fringe of the Bayesian 
paradigm. Nevertheless, maximum likelihood estimation is widely used in the literature on 
statistical communication theory, largely because in ignoring the prior n(Q), it is less 
demanding than maximum posterior estimation in computational complexity. 



Illustrating the a posteriori /@ | x ( 0 1 x ) f° r the case of a one-dimensional 
parameter space. 
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The MAP and ML estimates do share a common possibility, in that the maximizations 
in (3.77) and (3.78) may lead to more than one global maximum. However, they do differ 
in one important result: the maximization indicated in (3.78) may not always be possible; 
that is, the procedure used to perform the maximization may diverge. To overcome this 
difficulty, the solution to (3.78) has to be stabilized by incorporating prior information on 
the parameter space, exemplified by the distribution tt(0), into the solution, which brings 
us back to the Bayesian approach and, therefore, (3.77). The most critical part in the 
Bayesian approach to statistical modeling and parameter estimation is how to choose the 
prior /t(9). There is also the possibility of the Bayesian approach requiring high- 
dimensional computations. We should not, therefore, underestimate the challenges 
involved in applying the Bayesian approach, on which note we may say the following: 

Parameter Estimation in Additive Noise 

Consider a set N of scalar observations, defined by 


It is assumed that the random variables Nj are all independent of each other, and also 
independent from 0 . The issue of interest is to find the MAP of the parameter 0. 

To find the distribution of the random variable X p we invoke Property 2 of the Gaussian 
distribution, described in Section 3.9, in light of which we may say that X t is also Gaussian 
with mean 9 and variance cr~ . Furthermore, since the Nj are independent, by assumption, 
it follows that the Xj are also independent. Hence, using the vector x to denote the N 
observations, we express the observation density of x as 


Xj = 9 + n { , i = 1, 2, ..., N 


2 

where the unknown parameter 0 is drawn from the Gaussian distribution JV(0, erg) ; that is, 



2 

Each tij is drawn from another Gaussian distribution .\”( 0, a n ) ; that is, 



/x ' e<x|e) = 




The problem is to determine the MAP estimate of the unknown parameter 9. 
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To solve this problem, we need to know the posterior density / 0 | X (0|x). Applying 
(3.72), we write 


/©ix( 0 l x ) = c(x) exp 


N 


2 

^0 


where 


c(x) 


'JIko'q ( J2na n ) 
/x(x) 


N 


The normalization factor c(x) is independent of the parameter 0 and, therefore, has no 
relevance to the MAP of 0. We therefore need only pay attention to the exponent in (3.82). 

Rearranging terms and completing the square in the exponent in (3.82), and introducing 
a new normalization factor c'(x) that absorbs all the terms involving x- , we get 


/0|x( e l x ) = c '(x) exp 


2 cf 


°e + (°],/N) 


f ^ 

1 " 


2 

-0 


V 1=1 J 

y 



where 


2 2 


2 

a p = 


Nal+al 


Equation (3.84) shows that the posterior density of the unknown parameter 0 is Gaussian 
with mean 0 and variance a p . We therefore readily find that the MAP estimate of 0 is 


0MAP = 


<T l + ( <T n / N) 


N 


\ 


v i=l 


which is the desired result. 

Examining (3.84), we also see that the N observations enter the posterior density of 0 
only through the sum of the x r It follows, therefore, that 

N 

f(x) = 

1=1 

is a sufficient statistic for the example at hand. This statement merely confirms that (3.84) 
and (3.87) satisfy the condition of (3.76) for a sufficient statistic. 
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Hypothesis Testing 


The Bayesian paradigm discussed in Section 3.11 focused on two basic issues: predictive 
modeling of the observation space and statistical analysis aimed at parameter estimation. 
As mentioned previously in that section, these two issues are the dual of each other. In this 
section we discuss another facet of the Bayesian paradigm, aimed at hypothesis testing, 
which is basic to signal detection in digital communications, and beyond. 


To set the stage for the study of hypothesis testing, consider the model of Figure 3.14. A 
source of binary data emits a sequence of Os and Is, which are respectively denoted by 
hypotheses Hq and If . The source (e.g., digital communication transmitter) is followed by 
a probabilistic transition mechanism (e.g., communication channel). According to some 
probabilistic law, the transition mechanism generates an observation vector x that defines 
a specific point in the observation space. 

The mechanism responsible for probabilistic transition is hidden from the observer 
(e.g., digital communication receiver). Given the observation vector x and knowledge of 
the probabilistic law characterizing the transition mechanism, the observer chooses 
whether hypothesis Hq or H l is true. Assuming that a decision must be made, the observer 
has to have a decision rule that works on the observation vector x, thereby dividing the 
observation space Z into two regions: Z 0 corresponding to Hq being true and Zj 
corresponding to H i being true. To simplify matters, the decision rule is not shown in 
Figure 3.14. 

In the context of a digital communication system, for example, the channel plays the 
role of the probabilistic transition mechanism. The observation space of some finite 


When the observation 
vector is assigned to 
decision region Zy. 



Diagram illustrating the binary hypothesis-testing problem. Note: according to the 
likelihood ration test, the bottom observation vector x is incorrectly assigned to Zj. 
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dimension corresponds to the ensemble of channel outputs. Finally, the receiver performs 
the decision rule. 


To proceed with the solution to the binary hypothesis-testing problem, we introduce the 
following notations: 

fx\H (x|ff 0 )> which denotes the conditional density of the observation vector x 
given that hypothesis Hq is true. 

/ x h f x | // 1 ) , denotes the conditional density of x given that the other hypothesis 
H l is true. 

7Tq and /T, denote the priors of hypotheses H 0 and //,, respectively. 

In the context of hypothesis testing, the two conditional probability density functions 
/ X | ffo (x|// 0 ) and fx\H^ x \^0 are referred to as likelihood functions, or just simply 
likelihoods. 

Suppose we perform a measurement on the transition mechanism’s output, obtaining 
the observation vector x. In processing x, there are two kinds of errors that can be made by 
the decision rule: 

Error of the first kind. This arises when hypothesis Hq is true but the rule makes a 
decision in favor of H l? as illustrated in Figure 3.14. 

Error of the second kind. This arises when hypothesis If is true but the rule makes a 
decision in favor of Hq. 

The conditional probability of an error of the first kind is 


where Zj is part of the observation space that corresponds to hypothesis H j . Similarly, the 
conditional probability of an error of the second kind is 


By definition, an optimum decision rule is one for which a prescribed cost function is 
minimized. A logical choice for the cost function in digital communications is the average 
probability of symbol error, which, in a Bayesian context, is referred to as the Bayes risk. 
Thus, with the probable occurrence of the two kinds of errors identified above, we define 
the Bayes risk for the binary hypothesis-testing problem as 


where we have accounted for the prior probabilities for which hypotheses Hq and // 1 are 
known to occur. Using the language of set theory, let the union of the disjoint subspaces Z 0 
and Zj be 


j ' /x|ff 0 ( x l^o) dx 


j fx\HS X \ H O dX 


'0 



Z = Z Q u Zj 
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Then, recognizing that the subspace Zj is the complement of the subspace Z 0 with respect 
to the total observation space Z, we may rewrite (3.88) in the equivalent form: 

^ = ^oj /x|tf n ( x l#o) dx + ;r lf fx\H,( x \ H O dx 

"77 l U J 7 l 1 

z. - 

= ^oj /x|// 0 ( X l // o) dx + J [ ;r l/x|//( X l // l)- 7l o/'x|tT 0 ( X l // o)] dx 

z z 0 

The integral fx\H ( x K^q) dx represents the total volume under the conditional density 
J z 1 0 

fx\H ( x |tf 0 ) ’ which, by definition, equals unity. Accordingly, we may reduce (3.90) to 

2ft = tt 0 + \ [V X itf ( x l ff 1 )dx-Vx l H 0 (x|^ 0 )]dx 

The term on its own on the right-hand side of (3.91) represents a fixed cost. The integral 
term represents the cost controlled by how we assign the observation vector x to Z 0 . 
Recognizing that the two terms inside the square brackets are both positive, we must 
therefore insist on the following plan of action for the average risk 2ft to be minimized: 


In light of this statement, the optimum decision rule proceeds as follows: 


If 


Vx|ff 0 ( x |tf())> Vxitf/Xl^l) 

then the observation vector x should be assigned to Z 0 , because these two terms 
contribute a negative amount to the integral in (3.91). In this case, we say H 0 is true. 
If, on the other hand. 


Vx|ff 0 ( x |tf())< Vxitf/Xl^l) 

then the observation vector x should be excluded from Z 0 (i.e., assigned to Zj), 
because these two terms would contribute a positive amount to the integral in (3.91). 
In this second case, H j is true. 


When the two terms are equal, the integral would clearly have no effect on the average risk 
2 ft ; in such a situation, the observation vector x may be assigned arbitrarily. 

Thus, combining points (1) and (2) on the action plan into a single decision rule, we 
may write 


fx\H^ x \ H 0 > 1 7T 0 

/x|ff 0 ( x l^o) H 0 n x 


The observation-dependent quantity on the left-hand side of (3.92) is called the likelihood 
ratio ; it is defined by 

Kt , /x|ffj( x l H l) 

A(x) = — — 

fx\H 0 ( x \ H o) 


From this definition, we see that A(x) is the ratio of two functions of a random variable; 
therefore, it follows that A(x) is itself a random variable. Moreover, it is a one- 
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dimensional variable, which holds regardless of the dimensionality of the observation 
vector x. Most importantly, the likelihood ratio is a sufficient statistic. 

The scalar quantity on the right-hand side of (3.92), namely, 


is called the threshold of the test. Thus, minimization of the Bayes risk 2ft leads to the 
likelihood ratio test, described by the combined form of two decisions: 

A(x) > r, 

H 0 

Correspondingly, the hypothesis testing structure built on (3.93)— (3.95) is called the 
likelihood receiver; it is shown in the form of a block diagram in Figure 3.15a. An elegant 
characteristic of this receiver is that all the necessary data processing is confined to 
computing the likelihood ratio A(x). This characteristic is of considerable practical 
importance: adjustments to our knowledge of the priors n {) and n\ are made simply 
through the assignment of an appropriate value to the threshold // . 

The natural logarithm is known to be a monotone function of its argument. Moreover, 
both sides of the likelihood ratio test in (3.95) are positive. Accordingly, we may express 
the test in its logarithmic form, as shown by 


lnA(x) > In 7 

H o 

where In is the symbol for the natural logarithm. Equation (3.96) leads to the equivalent 
log-likelihood ratio receiver, depicted in Figure 3.15b. 



If the threshold rj 
is exceeded, say 

Otherwise, say H 0 


(a) 


Two versions of the 
likelihood receiver: (a) based on the 
likelihood ratio A(x) ; (b) based on the 
log-likelihood ratio In A(x). 



Log-likelihood 

ratio 

computer 

1 n A(x) 

Comparator 







t 


If In r; is exceeded, 
say Hi 

►- Otherwise, say H 0 


In rj 


(b) 
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Binary Hypothesis Testing 

Consider a binary hypothesis testing problem, described by the pair of equations: 
Hypothesis H ^ : x ; . =m + n j , i = 1,2 

Hypothesis // () : a' ; = n- r i = 1,2, ...,N 

The term m is a constant that is nonzero only under hypothesis 7/j. As in Example 7, the 
are independent and Gaussian X( 0. cr“) . The requirement is to formulate a likelihood ratio 
test for this example to come up with a decision rule. 

Following the discussion presented in Example 7, under hypothesis H l we write 


fx t \H^ x i\ H \> ~ 


J2n 


-exp 


na„ 


(■*; ~ m ) 

2a„ 


2-i 


As in Example 7, let the vector x denote the set of N observations Xj for i = 1,2, ..., N. 
Then, invoking the independence of the n p we may express the joint density of the under 
hypothesis H j as 


N 


/x '"- (x |H|) = 


exp 


2 al 


2~i 


(J2na n ) 


N 


exp 


1 N , 

2 

2c7 „ i= 1 


Setting m to zero in (3.99), we get the corresponding joint density of the x- t under 
hypothesis Hq as 


1 


fx\H < ( X \ H ()) - N 

(J2na n ) 


exp 


1 


N 


\ 


2 2 ^ 
2cJ n i = 1 


Hence, substituting (3.99) and (3.100) into the likelihood ratio of (3.93), we get (after 
canceling common terms) 


r 


m 


N 


\ 


A(x) = exp 

Xn i = l 

Equivalently, we may express the likelihood ratio in its logarithmic form 


Z Nm 

X ‘~T1 

cr„ i—i 2 G nJ 


N .,2 

l A / \ ^2 v -1 N 171 

lnA(x) = — - 

a n i=l 2<T n 

Using (3.102) in the log-likelihood ratio test of (3.96), we get 


m Nrn 

2 Z X < 


A 


V°n i = 1 


2cf 


^ In 


72A 




o 
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Dividing both sides of this test by ( m/a ~ ) and rearranging terms, we finally write 


N 

i = 1 


?■' 

< 


H, 


a n, Nm' 

— In 17 + 


0 


v 


where the threshold 77 is itself defined by the ratio of priors, namely Equation 

(3.103) is the desired formula for the decision rule to solve the binary hypothesis-testing 
problem of (3.97). 

One last comment is in order. As with Example 7, the sum of the x ( - over the N 
observations; that is, 


N 

t(x) = £ X t 
i = 1 

is a sufficient statistic for the problem at hand. We say so because the only way in which 
the observations can enter the likelihood ratio A(x) is in the sum; see (3.101). 


Now that we understand binary hypothesis testing, we are ready to consider the more 
general scenario where we have M possible source outputs to deal with. As before, we 
assume that a decision must be made as to which one of the M possible source outputs was 
actually emitted, given an observation vector x. 

To develop insight into how to construct a decision rule for testing multiple hypotheses, 
we consider first the case of M = 3 and then generalize the result. Moreover, in formulating 
the decision rule, we will use probabilistic reasoning that builds on the findings of the 
binary hypothesis-testing procedure. In this context, however, we find it more convenient 
to work with likelihood functions rather than likelihood ratios. 

To proceed then, suppose we make a measurement on the probabilistic transition 
mechanism’s output, obtaining the observation vector x. We use this observation vector 
and knowledge of the probability law characterizing the transition mechanism to construct 
three likelihood functions, one for each of the three possible hypotheses. For the sake of 
illustrating what we have in mind, suppose further that in formulating the three possible 
probabilistic inequalities, each with its own inference, we get the following three results: 

7r lfx\H l ( x \ H 0 < %/x|// 0 ( x l^o) 

from which we infer that hypothesis H 0 or /7 2 is true. 

K 2fx\HM\ H 2) < %/x|// 0 ( X l^ 0 ) 

from which we infer that hypothesis H 0 or H l is true. 

7r 2fx\H 2 ( x \ H 2^ < Xbfx\ Hl ( X \ H 0 

from which we infer that hypothesis H i or H 0 is true. 

Examining these three possible results for M = 3, we immediately see that hypothesis H 0 
is the only one that shows up in all three inferences. Accordingly, for the particular 
scenario we have picked, the decision rule should say that hypothesis H 0 is true. Moreover, 
it is a straightforward matter for us to make similar statements pertaining to hypothesis //[ 
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or // 2 . The rationale just described for arriving at this test is an example of what we mean 
by probabilistic reasoning: the use of multiple inferences to reach a specific decision. 

For an equivalent test, let both sides of each inequality under points 1, 2, and 3 be 
divided by the evidence / X (x). Let if, i = 1 , 2, 3, denote the three hypotheses. We may then 
use the definition of joint probability density function to write 


/x<X) 


/ X W 


where P(H { ) = p t 


_ P (Hp x) 

/x( x ) 

P[ff,.|x]/ X (x) 

/x( x ) 

= P[ff.|x] for; = 0, 1, 

Flence, recognizing that the conditional probability P[// ( |x] is actually the posterior 
probability of hypothesis H t after receiving the observation vector x, we may now go on to 
generalize the equivalent test for M possible source outputs as follows: 


A processor based on this decision rule is frequently referred to as the MAP probability 
computer. It is with this general hypothesis testing rule that earlier we made the 
supposition embodied under points 1, 2, and 3. 

Composite Hypothesis Testing 


Throughout the discussion presented in Section 3.13, the hypotheses considered therein 
were all simple, in that the probability density function for each hypothesis was 
completely specified. However, in practice, it is common to find that one or more of the 
probability density functions are not simple due to imperfections in the probabilistic 
transition mechanism. In situations of this kind, the hypotheses are said to be composite. 

As an illustrative example, let us revisit the binary hypothesis-testing problem 
considered in Example 8. This time, however, we treat the mean m of the observable .ro- 
under hypothesis H l not as a constant, but as a variable inside some interval [m a , m b \. If, 
then, we were to use the likelihood ratio test of (3.93) for simple binary hypothesis testing, 
we would find that the likelihood ratio A(x ) involves the unknown mean m. We cannot 
therefore compute A(x) , thereby negating applicability of the simple likelihood ratio test. 

The message to take from this illustrative example is that we have to modify the 
likelihood ratio test to make it applicable to composite hypotheses. To this end, consider the 
model depicted in Figure 3.16, which is similar to that of Figure 3.14 for the simple case 
except for one difference: the transition mechanism is now characterized by the conditional 
probability density function / X |© //( x | B , ) , where 0 is a realization of the unknown 
parameter vector 0, and the index i = 0, 1. It is the conditional dependence on 0 that makes 
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Model of composite hypothesis-testing for a binary scenario. 


the hypotheses H 0 and \I\ to be of the composite kind. Unlike the simple model of Figure 
3.14, we now have two spaces to deal with: an observation space and a parameter space. It 
is assumed that the conditional probability density function of the unknown parameter 
vector 0, that is, /@ ^ ( 0, //,■) , is known for i = 0, 1. 

To formulate the likelihood ratio for the composite hypotheses described in the model 
of Figure 3.16, we require the likelihood function /x // ( x I ^;) f° r ' = 1> 2- We may satisfy 
this requirement by reducing the composite hypothesis-testing problem to a simple one by 
integrating over 0, as shown by 

J © 1 ' 

the evaluation of which is contingent on knowing the conditional probability density 
function of 0 given the //■ for i = 1, 2. With this specification at hand, we may now 
formulate the likelihood ratio for composite hypotheses as 

J /x| 0 , d 9 

A(x) = y- 

J 0 /x ©, //,/ x l 0 ' // o)./ 0 |// o (O| // o) 

Accordingly, we may now extend applicability of the likelihood ratio test described in 
(3.95) to composite hypotheses. 

From this discussion, it is clearly apparent that hypothesis testing for composite 
hypotheses is computationally more demanding than it is for simple hypotheses. Chapter 7 
presents applications of composite hypothesis testing to noncoherent detection, in the 
course of which the phase information in the received signal is accounted for. 

Summary and Discussion 


The material presented in this chapter on probability theory is another mathematical pillar 
in the study of communication systems. Herein, the emphasis has been on how to deal 
with uncertainty , which is a natural feature of every communication system in one form or 
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another. Typically, uncertainties affect the behavior of channels connecting the transmitter 
of a communication system to its receiver. Sources of uncertainty include noise, generated 
internally and externally, and interference from other transmitters. 

In this chapter, the emphasis has been on probabilistic modeling, in the context of 
which we did the following: 

Starting with set theory, we went on to state the three axioms of probability theory. 
This introductory material set the stage for the calculation of probabilities and 
conditional probabilities of events of interest. When partial information is available 
on the outcome of an experiment, conditional probabilities permit us to reason in a 
probabilistic sense and thereby enrich our understanding of a random experiment. 
We discussed the notion of random variables, which provide the natural tools for 
formulating probabilistic models of random experiments. In particular, we 
characterized continuous random variables in terms of the cumulative distribution 
function and probability density function; the latter contains all the conceivable 
information about a random variable. Through focusing on the mean of a random 
variable, we studied the expectation or averaging operator, which occupies a 
dominant role in probability theory. The mean and the variance, considered in that 
order, provide a weak characterization of a random variable. We also introduced the 
characteristic function as another way of describing the statistics of a random 
variable. Although much of the material in the early part of the chapter focused on 
continuous random variables, we did emphasize important aspects of discrete 
random variables by describing the concept of the probability mass function (unique 
to discrete random variables) and the parallel development and similar concepts that 
embody these two kinds of random variables. 

Table 3.2 on page 135 summarizes the probabilistic descriptions of some important 
random variances under two headings: discrete and random. Except for the Rayleigh 
random variable, these random variables were discussed in the text or are given as 
end-of-chapter problems; the Rayleigh random variable is discussed in Chapter 4. 
Appendix A presents advanced probabilistic models that go beyond the contents of 
Table 3.2. 

We discussed the characterization of a pair of random variables and introduced the 
basic concepts of covariance and correlation, and the independence of random 
variables. 

We provided a detailed description of the Gaussian distribution and discussed its 
important properties. Gaussian random variables play a key role in the study of 
communication systems. 

The second part of the chapter focused on the Bayesian paradigm, wherein inference may 
take one of two forms: 

• Probabilistic modeling, the aim of which is to develop a model for describing the 
physical behavior of an observation space. 

• Statistical analysis, the aim of which is the inverse of probabilistic modeling. 

In a fundamental sense, statistical analysis is more profound than probabilistic modeling, 
hence the focused attention on it in the chapter. 
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Some important random variables 


Bernoulli 


Poisson 


P x ( x ) = 


1 -P 
P 
0 


if x = 0 
if x = 1 
otherwise 


E[X] = p 
var[X] = p(l-p) 


T 

p x (k) = ^j-ex p(-A), k = 0, 1, 2, and A> 0 

E[AT] = A 
var[X] = A 


Uniform 


Exponential 


Gaussian 


Rayleigh 


Laplacian 


/*(■*) = l^Ta a ~ x ~ b 

E[X] = \_{a + b) 
var[X] = j^(b-a) 2 

f x (x ) = A exp (-Ax), x>0 and A > 0 

E[X] = l/A 
var[X 2 ] = l/A 2 

1 2 2 

f x (x) = exp [-(x-/u) /2a], -go < x < co 

J2na 

E[X] = p 
var[AT] = <j 

f x (x) = - 7 ;exp(-x 2 / 2 (T~), x>0and<7>0 

a 

E[X] = aj n/2 
var[X] = ( 2 - 1) ° 

2 

f x {. x) = — exp ( — GI|jc| ), -oo < x < oo and A > 0 

E[X] = 0 
var[X] = l/A 2 
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Under statistical analysis, viewed from a digital communications perspective, we 
discussed the following: 

Parameter estimation, where the requirement is to estimate an unknown parameter 
given an observation vector; herein we covered: 

• the maximum a posteriori (MAP) rule that requires prior information, and 

• the maximum likelihood procedure that by-passes the need for the prior and 
therefore sits on the fringe of the Bayesian paradigm. 

Hypothesis testing, where in a simple but important scenario, we have two 
hypotheses to deal with, namely H l and H 0 . In this case, the requirement is to make 
an optimal decision in favor of hypothesis H l or hypothesis Hq given an observation 
vector. The likelihood ratio test plays the key role here. 

To summarize, the material on probability theory sets the stage for the study of stochastic 
processes in Chapter 4. On the other hand, the material on Bayesian inference plays a key 
role in Chapters 7, 8, and 9 in one form or another. 


Problems 

Set Theory 

Using Venn diagrams, justify the five properties of the algebra of sets, which were stated (without 
proofs) in Section 3.1: 
idempotence property 
commutative property 
associative property 
distributive property 
De Morgan’s laws. 

Let A and B denote two different sets. Validate the following three equalities: 

A c = (A c n B) u (A° n £ c ) 

B c = (A n fi c ) u (A c n B c ) 

(Anfif = (A c n B) u (A c n B c ) u (A n B c ) 

Probability Theory 

Using the Bernoulli distribution of Table 3.2, develop an experiment that involves three independent 
tosses of a fair coin. Irrespective of whether the toss is a head or tail, the probability of every toss is 
to be conditioned on the results of preceding tosses. Display graphically the sequential evolution of 
the results. 

Use Bayes’ rule to convert the conditioning of event B given event A, into the conditioning of event 
Aj given event B for the i = 1, 2, ..., N. 

A discrete memoryless channel is used to transmit binary data. The channel is discrete in that it is 
designed to handle discrete messages and it is memoryless in that at any instant of time the channel 
output depends on the channel input only at that time. Owing to the unavoidable presence of noise in 
the channel, errors are made in the received binary data stream. The channel is symmetric in that the 
probability of receiving symbol 1 when symbol 0 is sent is the same as the probability of receiving 
symbol 0 when symbol 1 is sent. 
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The transmitter sends Os across the channel with probability p 0 and Is with probability p\ . The 
receiver occasionally makes random decision errors with probability p; that is, when symbol 0 is 
sent across the channel, the receiver makes a decision in favor of symbol 1, and vice versa. 

Referring to Figure P3.5, determine the following a posteriori probabilities: 

The conditional probability of sending symbol A 0 given that symbol B 0 was received. 

The conditional probability of sending symbol A l given that symbol B | was received. 

Hint: Formulate expressions for the probability of receiving event B 0 , and likewise for event B |. 


Let Rj, B 2 , . . ., B n denote a set of joint events whose union equals the sample space 5, and assume 
that P[B ; ] > 0 for all i. Let A be any event in the sample space S. 


The total probability theorem states: 

P[A] = P[A|B 1 ]P[B 1 ] + P[A|B 2 ]P[B 2 ] + ... + P[A|BJP[BJ 

This theorem is useful for finding the probability of event B when the conditional probabilities 
P[A|B,] are known or easy to find for all i. Justify the theorem. 

Figure P3.7 shows the connectivity diagram of a computer network that connects node A to node B 
along different possible paths. The labeled branches of the diagram display the probabilities for 
which the links in the network are up; for example, 0.8 is the probability that the link from node A to 
intermediate node C is up, and so on for the other links. Link failures in the network are assumed to 
be independent of each other. 

When all the links in the network are up, find the probability that there is a path connecting node 
A to node B. 

What is the probability of complete failure in the network, with no connection from node A to 
node B? 



1 -p 


Show that 


A = (A n Bj) u (A n Bt) u ... (AnB n ) 
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Distribution Functions 

The probability density function of a continuous random variable X is defined by 


f x ( x ) = 


-y: for 0 < x < 1 

fx 

0 otherwise 


Despite the fact that this function becomes infinitely large as x approaches zero, it may qualify to be a 
legitimate probability density function. Find the value of scalar c for which this condition is satisfied. 

The joint probability density function of two random variables X and Y is defined by the two- 
dimensional uniform distribution 

[ c for a <x<b and a < y < b 
f x Jx, y) = \ 

[0 otherwise 


Find the scalar c for which f x y(.*,y) satisfies the normalization property of a two-dimensional 
probability density function. 

In Table 3.2, the probability density function of a Rayleigh random variable is defined by 

for x > 0 and a> 0 

Show that the mean of X is 

SIX] = 

Using the result of part a, show that the variance of X is 

var[X] = (2- go- 2 

Use the results of a and b to determine the Rayleigh cumulative distribution function. 

The probability density function of an exponentially distributed random variable X is defined by 

\Aexp(-Ax), forO<x<oo 

f x (x ) = < 

[0, otherwise 



where A is a positive parameter. 

Show that/xfx) is a legitimate probability density function. 
Determine the cumulative distribution function of X. 


Consider the one-sided conditional exponential distribution 


f x {x\A) = - 


A 

Z(A) 

0 , 


exp (-Ax), 


1 < x < 20 
otherwise 


where A > 0 and Z(A) is the normalizing constant required to make the area under f x (x\ A) equal 
unity. 

Determine the normalizing constant Z(A). 

Given N independent values of x, namely Xj, ..., x N , use Bayes’ rule to formulate the 
conditional probability density function of the parameter A, given this data set. 
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Expectation Operator 

In Section 3.6 we described two properties of the expectation operator E, one on linearity and the 
other on statistical independence. In this problem, we address two other important properties of the 
expectation operator. 

Scaling property: Show that 

E(ax) = oE[X] 

where a is a constant scaling factor. 

Linearity of conditional expectation: Show that 

E[Xj +X 2 \Y] = EtXjlK] + E[X 2 \Y] 

Validate the expected value rule of (3.41) by building on two expressions: 
g(x) = max[g(x), 0] - max[-g(x), 0] 

For any a > 0, g( x) > a provided that max[g(x), 0] > a 

Let X be a discrete random variable with probability mass function Px(x) and let g(X) be a function 
of the random variable X. Prove the following rule: 


E[g(X)] = £g(x)p x (x) 

X 


where the summation is over all possible discrete values of X. 

Continuing with the Bernoulli random variable X in (3.23), find the mean and variance of X. 
The mass probability function of the Poisson random variable X is defined by 
p x (k) = j^A k exp(-A), k = 0, 1, 2, ..., and A>0 

Find the mean and variance of X. 


Find the mean and variance of the exponentially distributed random variable X in Problem 3.1 1. 
The probability density function of the Laplacian random variable X in Table 3.2 is defined by 


f x ( x ) = 


^A exp (-Ax) for x > 0 
^ A exp(Ax) for x < 0 
for the parameter A > 0. Find the mean and variance of X. 


In Example 5 we used the characteristic function ®(jv) to calculate the mean of an exponentially 
distributed random variable X. Continuing with that example, calculate the variance of X and check 
your result against that found in Problem 3.18. 

The characteristic function of a continuous random variable X, denoted by <D(v), has some 
important properties of its own: 

The transformed version of the random variable X, namely, aX + b, has the following 
characteristic function 


E[exp(jv / (aX + fe))] = exp (j b v) ■ Q x (a v) 
where a and b are constants. 

The characteristic function cD( v) is real if, and only if, the distribution function F x (x), pertaining 
to the random variable X , is symmetric. 
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Prove the validity of these two properties, and demonstrate that property b is satisfied by the two- 
sided exponential distribution described in Problem 3.19. 

LetZ and Tbe two continuous random variables. One version of the total expectation theorem states 

E[X] = f E[X\Y = y\f Y (y)dy 

-00 

Justify this theorem. 


Inequalities and Theorems 

Let Z be a continuous random variable that can only assume nonnegative values. The Markov 
inequality states 

P[I>o]<-Em, a > 0 

a 

Justify this inequality. 

In (3.46) we stated the Chebyshev inequality without proof. Justify this inequality. 

2 2 

Hint: consider the probability P[(Z - p)~ > e~] and then apply the Markov inequality, considered 
in Problem 3.23, with a = e 1 . 

Consider a sequence Zj, Z 2 , ..., X n of independent and identically distributed random variables with 

2 

mean p and variance a . The sample mean of this sequence is defined by 



i—l 


The weak law of large numbers states 

lim P [ \M n - p I < £■] = 0 for s > 0 

«-> QO 

Justify this law. Hint: use the Chebyshev inequality. 

Let event A denote one of the possible outcomes of a random experiment. Suppose that in n 
independent trials of the experiment the event A occurs n A times. The ratio 

n a 

is called the relative frequency or empirical frequency of the event A. Let p = P[A] denote the 
probability of the event A. The experiment is said to exhibit “statistical regularity” if the relative 
frequency M n is most likely to be within s of p for large n. Use the weak law of large numbers, 
considered in Problem 3.25, to justify this statement. 


The Gaussian Distribution 

In the literature on signaling over additive white Gaussian noise (AWGN) channels, formulas are 
derived for probabilistic error calculations using the complementary error function 


erfc(x) 


1 r x 2 

1-— f exp(-f ) d t 

Vtt n 


Show that the erfc(A) is related to the g-function as follows 
1 


Q(x) = 2 er H^ 

erfc(jc) = 2Q(j2x) 
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Equation (3.58) defines the probability density function of a Gaussian random variable X. Show that 
the area under this function is unity, in accordance with the normalization property described in 
(3.59). 

Continuing with Problem3.28, justify the four properties of the Gaussian distribution stated in 
Section 3.8 without proofs. 

Show that the characteristic function of a Gaussian random variable X of mean /i x and variance 

4 is 

</> x ( v ) = ex P (j ^ l ' 2 4) 


nth central moment of this Gaussian random variable is 

3x5... (n-l)tjx for n even 
for n odd 

2 

A Gaussian-distributed random variable X of zero mean and variance oj is transformed by a 
piecewise-linear rectifier characterized by the input-output relation (see Figure P3.31): 

Y = \x, X>0 

1 0, X<0 


Using the result of part a, show that the 
as follows: 


E[(X -//*)"] = 


1 x 

0 


The probability density function of the new random variable Y is described by 


f Y (y) = 


0, 

kS(y) 

1 f y2 

—— — exp — - — 

V2jio- x l, 2 a x 


y<0 

y = 0 

y>0 


Explain the physical reasons for the functional form of this result. 

Determine the value of the constant k by which the delta function S(y) is weighted. 



In Section 3.9 we stated the central limit theorem embodied in (3.71) without proof. Justify this 
theorem. 

Bayesian Inference 

Justify the likelihood principle stated (without proof) in Section 3.11. 

In this problem we address a procedure for estimating the mean of the random variable; the 
procedure was discussed in Section 3.6. 
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Consider a Gaussian-distributed variable X with unknown mean ju x and unit variance. The mean p x 
is itself a random variable, uniformly distributed over the interval [a, £>]. To do the estimation, we are 
given At independent observations of the random variable X. Justify the estimator of (3.36). 

In this problem, we address the issue of estimating the standard deviation a of a Gaussian- 
distributed random variable X of zero mean. The standard deviation itself is uniformly distributed 
inside the interval [ <j\ , ay l- For the estimation, we have N independent observations of the random 
variable X , namely, x\, x 2 , ..., Xjy. 

Derive a formula for the estimator a using the MAP rule. 

Repeat the estimation using the maximum likelihood criterion. 

Comment on the results of parts a and b. 

A binary symbol X is transmitted over a noisy channel. Specifically, symbol X = 1 is transmitted with 
probability p and symbol X = 0 is transmitted with probability (1 — p). The received signals at the 
channel output are defined by 

Y = X + N 

The random variable A represents channel noise, modeled as a Gaussian-distributed random variable 
with zero mean and unit variance. The random variables X and N are independent. 

Describe how the conditional probability P[A = 0|T = y] varies with increasing y, all the way 
from —co to +co . 

Repeat the problem for the conditional probability P[X = 1 1 K = v]. 

Consider an experiment involving the Poisson distribution, whose parameter X is unknown. Given 
that the distribution of X follows the exponential law 

f (X) = | a exp {-a X), X>0 

[ 0, otherwise 

where a > 0, show that the MAP estimate of the parameter X is given by 

^mapW = 

1 + a 

where k is the number of events used in the observation. 

In this problem we investigate the use of analytic arguments to justify the optimality of the MAP 
estimate for the simple case of a one-dimensional parameter vector. 

Define the estimation error 

e 0 (x) = 0-0(x) 

where 0 is the value of an unknown parameter, 9(x) is the estimator to be optimized, and x is the 
observation vector. Figure P3.38 shows a uniform cost function, C(e), for this problem, with zero 
cost being incurred only when the absolute value of the estimation error e 0 (x) is less than or equal 
to A/2. 

Formulate the Bayes’ risk 5ft for this parameter estimation problem, accounting for the joint 
probability density function/^ x (0 ,x). 

Hence, determine the MAP estimate 0map by minimizing the risk 5ft with respect to 0(x) . For 
this minimization, assume that A is an arbitrarily small number but nonzero. 
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In this problem we generalize the likelihood ratio test for simple binary hypotheses by including 
costs incurred in the decision-making process. Let C,y denote the cost incurred in deciding in favor of 
hypothesis H i when hypothesis Hj is true. Hence, show that the likelihood ratio test of (3.95) still 
holds, except for the fact that the threshold of the test is now defined by 

*o( c 10 ~ Cqq) 

*j( c oi - C'li) 


Consider a binary hypothesis-testing procedure where the two hypotheses H 0 and Hy are described 
by different Poisson distributions, characterized by the parameters Aq and Ay, respectively. The 
observation is simply a number of events k, depending on whether Hq or By is true. Specifically, for 
these two hypotheses, the probability mass functions are defined by 

(A/ 

P x {k) = — £j-ex p(-A,), k = 0, 1,2, ..., 

where i = 0 for hypothesis H 0 and i = 1 for hypothesis By . Determine the log-likelihood ratio test for 
this problem. 

Consider the binary hypothesis-testing problem 

By : X = M + N 
H 0 :X = N 

The M and N are independent exponentially distributed random variables, as shown by 

p m exp(-/t ffl ), m > 0 

Pm ( ot) = n 

[0, otherwise 


P N (n) 


U n ex p(-/t„), n> 0 

[0, otherwise 


Determine the likelihood ratio test for this problem. 

In this problem we revisit Example 8. But this time we assume that the mean m under hypothesis Hy 
is Gaussian distributed, as shown by 


f U \HS m \ H 0 


1 f 

m 

— ex p 
«/27t a m l 

as) 


Derive the likelihood ratio test for the composite hypothesis scenario just described. 
Compare your result with that derived in Example 8. 
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Notes 


1 . For a readable account of probability theory, see Bertsekas and Tsitsiklis (2008). For an advanced 
treatment of probability theory aimed at electrical engineering, see the book by Fine (2006). For an 
advanced treatment of probability theory, see the two-volume book by Feller (1968, 1971). 

2. For an interesting account of inference, see the book by MacKay (2003). 

3. For a detailed treatment of the characterization of discrete random variables, see Chapter 2 of the 
book by Bertsekas and Tsitsiklis (2008). 

4. Indeed, we may readily transform the probability density function of (3.58) into the standard 
form by using the linear transformation 

y = -V-a) 

(7 

In so doing, (3.58) is simplified as follows: 

/y00 = ^exp(-y 2 /2) 

J2n 

which has exactly the same mathematical form as (3.65), except for the use of y in place of x. 

5. Calculations based on Bayes' rule, presented previously as (3.14), are referred to as “Bayesian.” 
In actual fact, Bayes provided a continuous version of the rule; see (3.72). In a historical context, it is 
also of interest to note that the full generality of (3.72) was not actually perceived by Bayes; rather, 
the task of generalization was left to Laplace. 

6. It is because of this duality that the Bayesian paradigm is referred to as a principle of duality; see 
Robert (2001). Robert’s book presents a detailed and readable treatment of the Bayesian paradigm. 
For a more advanced treatment of the subject, see Bernardo and Smith (1998). 

7. In a paper published in 1912, R.A. Fisher moved away from the Bayesian approach. Then, in a 
classic paper published in 1922, he introduced the likelihood. 

8. In Appendix B of their book, Bernardo and Smith (1998) show that many non-Bayesian inference 
procedures do not lead to identical inferences when applied to such proportional likelihoods. 

9. For detailed discussion of the sufficient statistic, see Bernardo and Smith (1998). 

10. A more detailed treatment of parameter-estimation theory is presented in the classic book by 
Van Trees (1968); the notation used by Van Trees is somewhat different from that used in this 
chapter. See also the book by McDonough and Whalen (1995). 

11. For a more detailed treatment and readable account of hypothesis testing, see the classic book 
by Van Trees (1968). See also the book by McDonough and Whalen (1995). 
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Introduction 


Stated in simple terms, we may say: 


Elaborating on this succinct statement, we find that in many of the real-life phenomena 
encountered in practice, time features prominently in their description. Moreover, their 
actual behavior has a random appearance. Referring back to the example of wireless 
communications briefly described in Section 3.1, we find that the received signal at the 
wireless channel output varies randomly with time. Processes of this kind are said to be 
random or stochastic ; hereafter, we will use the term “stochastic.” Although probability 
theory does not involve time, the study of stochastic processes naturally builds on 
probability theory. 

The way to think about the relationship between probability theory and stochastic 
processes is as follows. When we consider the statistical characterization of a stochastic 
process at a particular instant of time, we are basically dealing with the characterization of 
a random variable sampled (i.e., observed) at that instant of time. When, however, we 
consider a single realization of the process, we have a random waveform that evolves 
across time. The study of stochastic processes, therefore, embodies two approaches: one 
based on ensemble averaging and the other based on temporal averaging. Both 
approaches and their characterizations are considered in this chapter. 

Although it is not possible to predict the exact value of a signal drawn from a stochastic 
process, it is possible to characterize the process in terms of statistical parameters such as 
average power, correlation functions, and power spectra. This chapter is devoted to the 
mathematical definitions, properties, and measurements of these functions, and related issues. 

Mathematical Definition of a Stochastic Process 


To summarize the introduction: stochastic processes have two properties. First, they are 
functions of time. Second, they are random in the sense that, before conducting an experiment, 
it is not possible to define the waveforms that will be observed in the future exactly. 

In describing a stochastic process, it is convenient to think in terms of a sample space. 
Specifically, each realization of the process is associated with a sample point. The totality 
of sample points corresponding to the aggregate of all possible realizations of the 
stochastic process is called the sample space. Unlike the sample space in probability 
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theory, each sample point of the sample space pertaining to a stochastic process is a 
function of time. We may therefore think of a stochastic process as the sample space or 
ensemble composed of functions of time. As an integral part of this way of thinking, we 
assume the existence of a probability distribution defined over an appropriate class of sets 
in the sample space, so that we may speak with confidence of the probability of various 
events observed at different points of time. 

Consider, then, a stochastic process specified by 

outcomes s observed from some sample space S; 
events defined on the sample space S; and 
probabilities of these events. 

Suppose that we assign to each sample point 5 a function of time in accordance with the rule 

X(t,s), -T<t<T 

where 2 T is the total observation interval. For a fixed sample point sj, the graph of the 
function X(t, sj) versus time t is called a realization or sample function of the stochastic 
process. To simplify the notation, we denote this sample function as 

Xj(t) = X(t,Sj), -T<t<T 

Figure 4.1 illustrates a set of sample functions [Xj(t)\j = 1,2, ...,«}. From this figure, we 
see that, for a fixed time fy. inside the observation interval, the set of numbers 

{xj (t k ),x 2 (t k ), ...,x n (t k )} = {X(t k , .y,), X(t k , s 2 ), ...,X(t k ,s n )} 




Outcome of the 
first trial of 
the experiment 


Outcome of the 
second trial of 
the experiment 


Outcome of the 
nth trial of 
the experiment 


An ensemble of sample functions. 
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constitutes a random variable. Thus, a stochastic process X(t, s ) is represented by the time- 
indexed ensemble (family) of random variables {X(t, 5 )). To simplify the notation, the 
customary practice is to suppress the s and simply use X(t) to denote a stochastic process. 
We may now formally introduce the definition: 


Moreover, we may distinguish between a random variable and a random process as 
follows. For a random variable, the outcome of a stochastic experiment is mapped into a 
number. On the other hand, for a stochastic process, the outcome of a stochastic 
experiment is mapped into a waveform that is a function of time. 

Two Classes of Stochastic Processes: Strictly Stationary and 
Weakly Stationary 


In dealing with stochastic processes encountered in the real world, we often find that the 
statistical characterization of a process is independent of the time at which observation of 
the process is initiated. That is, if such a process is divided into a number of time intervals, 
the various sections of the process exhibit essentially the same statistical properties. Such 
a stochastic process is said to be stationary. Otherwise, it is said to be nonstationary. 
Generally speaking, we may say: 


To be more precise, consider a stochastic process X(t) that is initiated at t = -00 . Let 
X(t{), X(t 2 ), ...,X(t k ) denote the random variables obtained by sampling the process X(t) at 
times f 2 , ..., t k , respectively. The joint (cumulative) distribution function of this set of 
random variables is F X(t ] ), ... x ^ t ^(xj, ...,x k ). Suppose next we shift all the sampling 
times by a fixed amount r denoting the time shift , thereby obtaining the new set of random 
variables: X(t l + t), X(t 2 + r), . .., X(t k + r). The joint distribution function of this latter set of 
random variables is F x ^ t + r) X{[ + r j(-V], ..., x k ). The stochastic process X(t) is said to 
be stationary in the strict sense, or strictly stationary, if the invariance condition 

F X(t 1 + t), ...,X(t k + r)( x l> = F X(t ] ), ...,X(t k )( x V -•’ x &) 

holds for all values of time shift t, all positive integers k, and any possible choice of 
sampling times t \ , . .., t k . In other words, we may state: 


Note that the finite-dimensional distributions in (4.2) depend on the relative time 
separation between random variables, but not on their absolute time. That is, the stochastic 
process has the same probabilistic behavior throughout the global time t. 


148 


Stochastic Processes 


Similarly, we may say that two stochastic processes X(t) and Y(t) are jointly strictly 
stationary if the joint finite-dimensional distributions of the two sets of stochastic 
variables X(tf), ...,X(t k ) and Y( /j ), Y( Y ) are invariant with respect to the origin 
1 = 0 for all positive integers k and j, and all choices of the sampling times f 1( t k and 

t' t' 

1 ’ j ' 

Returning to (4.2), we may identify two important properties: 

For k = 1 , we have 

F x(t)( x ) = F x(t+ r)W = F x( x ) fora11 f andr 
In words, the first-order distribution function of a strictly stationary stochastic 
process is independent of time t. 

For k = 2 and r = -f 2 , we have 

F X(t{), X(t 2 )( X V X 2> = F X(0), X(t l - t 2 )( x l’ x 2> for a11 h and r 2 

In words, the second-order distribution function of a strictly stationary stochastic 
process depends only on the time difference between the sampling instants and not 
on the particular times at which the stochastic process is sampled. 

These two properties have profound practical implications for the statistical 
parameterization of a strictly stationary stochastic process, as discussed in Section 4.4. 


Multiple Spatial Windows for Illustrating Strict Stationarity 

Consider Figure 4.2, depicting three spatial windows located at times fj, t 2 , ty We wish to 
evaluate the probability of obtaining a sample function x(t) of a stochastic process X(t) that 
passes through this set of windows; that is, the probability of the joint event 

P(A) = F x{tf),x{t 2 ),x{tfi^ b i’ b 2 , bf) = ^x(f 1 ),x(/ 2 ),x(f 3 )( a i’ fl 2’ a 3) 

Suppose now the stochastic process X(t) is known to be strictly stationary. An implication 
of strict stationarity is that the probability of the set of sample functions of this process 
passing through the windows of Figure 4.3a is equal to the probability of the set of sample 
functions passing through the corresponding time-shifted windows of Figure 4.3b. Note, 
however, that it is not necessary that these two sets consist of the same sample functions. 
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Illustrating the probability of a joint event. 
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Illustrating the concept of 
stationarity in Example 1. 
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Another important class of stochastic processes is the so-called weakly stationary 
processes. To be specific, a stochastic process X(t ) is said to be weakly stationary if its 
second-order moments satisfy the following two conditions: 

The mean of the process X(t ) is constant for all time t. 

The autocorrelation function of the process X(t) depends solely on the difference 
between any two times at which the process is sampled; the “auto” in autocorrelation 
refers to the correlation of the process with itself. 

In this book we focus on weakly stationary processes whose second-order statistics satisfy 
conditions 1 and 2; both of them are easy to measure and considered to be adequate for 
practical purposes. Such processes are also referred to as wide-sense stationary processes 
in the literature. Henceforth, both terminologies are used interchangeably. 


Mean, Correlation, and Covariance Functions of 
Weakly Stationary Processes 


Consider a real-valued stochastic process X(t). We define the mean of the process X(t) as 
the expectation of the random variable obtained by sampling the process at some time f, as 
shown by 

p x (t) = E[X(0] 


xf X ( t p) dx 
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where f x ^(x) is the first-order probability density function of the process X(t), observed 
at time t; note also that the use of single X as subscript in p x ( t) is intended to emphasize 
the fact that p x ( t) is a first-order moment. For the mean /u x (t) to be a constant for all time t 
so that the process X(t) satisfies the first condition of weak stationarity, we require that 
f X ( t )(x) be independent of time t. Consequently, (4.5) simplifies to 

p x (t) = n x for all t 

We next define the autocorrelation function of the stochastic process X(t) as the 
expectation of the product of two random variables, X(t\) and X(t 2 ), obtained by sampling 
the process X(t) at times t\ and t 2 , respectively. Specifically, we write 

^xx(h’ ^2) = E[X(f j)X(f 2 )] 

CO CO 

= J J x ] x 2fx(t ] ),X(t 2 )^ X \’ X 2^ dx 2 

—00 —00 

where f x ^ t ^ x(t)( x l’ x 2) * s t ^ e j°i n t probability density function of the process X(t) 
sampled at times t j and t 2 ; here, again, note that the use of the double X subscripts is 
intended to emphasize the fact that M^t j , t 2 ) is a second-order moment. For M xx (t ] , t 2 ) to 
depend only on the time difference t 2 - f so that the process X(t) satisfies the second 
condition of weak stationarity, it is necessary for f x ^ t ^ X(l ^(xq, x 2 ) to depend only on the 
time difference t 2 - f, . Consequently, (4.7) reduces to 

M xx {t\> t 2 ) = E[X(tj)X(f 2 )] 

= R xx (t 7 - tj) for all t { and r 2 

In (4.8) we have purposely used two different symbols for the autocorrelation function: 
M X x(t]i h) for any stochastic process X(t) and R xx (t 2 - f ) for a stochastic process that is 
weakly stationary. 

Similarly, the autocovariance function of a weakly stationary process X(t) is defined by 
Cxx( f V = ^[(^(fi ) - /v^)(A r (f 2 ) - /-i x )] 

= R XX^2 “ f l) ~ ^X 

Equation (4.9) shows that, like the autocorrelation function, the autocovariance function of 
a weakly stationary process X(t) depends only on the time difference (t 2 - tf. This 
equation also shows that if we know the mean and the autocorrelation function of the 
process X(t), we can uniquely determine the autocovariance function. The mean and 
autocorrelation function are therefore sufficient to describe the first two moments of the 
process. 

However, two important points should be carefully noted: 

The mean and autocorrelation function only provide a weak description of the 
distribution of the stochastic process X(t). 

The conditions involved in defining (4.6) and (4.8) are not sufficient to guarantee the 
stochastic process X(t) to be strictly stationary, which emphasizes a remark that was 
made in the preceding section. 
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Nevertheless, practical considerations often dictate that we simply limit ourselves to a 
weak description of the process given by the mean and autocorrelation function because 
the computation of higher order moments can be computationally intractable. 

Henceforth, the treatment of stochastic processes is confined to weakly stationary pro- 
cesses, for which the definitions of the second-order moments in (4.6), (4.8), and (4.9) hold. 


For convenience of notation, we reformulate the definition of the autocorrelation function 
of a weakly stationary process X(t), presented in (4.8), as 

R X x(t) = E[X(f + r)X(f)] for all t 

where r denotes a time shift, that is, t = f 2 and t = t l - t-, . This autocorrelation function 
has several important properties. 

Mean-square Value 

The mean-square value of a weakly stationary process X(t) is obtained from Rxx( T ) simply 
by putting r = 0 in (4.10), as shown by 

R xx ( 0) = E[x 2 (t)] 


Symmetry 

The autocorrelation function Rxxft) of a weakly stationary process X(t) is an even 
function of the time shift r; that is, 

R xx( r ) = R xx^~ r '> 

This property follows directly from (4.10). Accordingly, we may also define the 
autocorrelation function Rxx( T ) as 

R xx' ( T ) = ^-(X(t)X(t — r)] 

In words, we may say that a graph of the autocorrelation function R xx ( r), plotted versus r, 
is symmetric about the origin. 

Bound on the Autocorrelation Function 

The autocorrelation function Rxx( T ) attains its maximum magnitude at r = 0; that is, 

\ R xx( T )\^ R xxW 

To prove this property, consider the nonnegative quantity 

E[(X(r + r)±X(t)) 2 ]> 0 

Expanding terms and taking their individual expectations, we readily find that 

E [X 2 (t + r)] ± 2E [X(t + r)] + E[X 2 (0] > 0 
which, in light of (4.11) and (4.12), reduces to 

2 R xx( — ^ R xx( r ) - 0 
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Equivalently, we may write 

-R xx (0)<R xx (t)<R xx (0) 
from which (4.13) follows directly. 


Normalization 


Values of the normalized autocorrelation function 


Pxx ^ r ) - 


R xx( r ) 

R xx(0) 


are confined to the range [-1, 1], 

This last property follows directly from (4.13). 


The autocorrelation function Rxx( T ) is significant because it provides a means of 
describing the interdependence of two random variables obtained by sampling the 
stochastic process X(t) at times z seconds apart. It is apparent, therefore, that the more 
rapidly the stochastic process X(t) changes with time, the more rapidly will the 
autocorrelation function R xx ( f) decrease from its maximum Rxx( 0) as z increases, as 
illustrated in Figure 4.4. This behavior of the autocorrelation function may be 
characterized by a decorrelation time r c | ec , such that, for z > rj ec , the magnitude of the 
autocorrelation function Rxx( T ) remains below some prescribed value. We may thus 
introduce the following definition: 


For the example used in this definition, the parameter r,j ec is referred to as the one-percent 
decorrelation time. 

Sinusoidal Wave with Random Phase 

Consider a sinusoidal signal with random phase, defined by 

X(t) = A cos(27t/ c r + 0) 


Illustrating the 
autocorrelation 
functions of slowly and 
rapidly fluctuating 
stochastic processes. 
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where A and/ c are constants and © is a random variable that is uniformly distributed over 
the interval [- 71 , 7t] ; that is. 


f & {0) = < 


2n' 

0 , 


-n < 9< n 
elsewhere 


According to (4.16), the random variable 0 is equally likely to have any value 6 in the 
interval [- 71 , 7t]. Each value of 6 corresponds to a point in the sample space S of the 
stochastic process X(t). 

The process X(t ) defined by (4.15) and (4.16) may represent a locally generated carrier 
in the receiver of a communication system, which is used in the demodulation of a 
received signal. In such an application, the random variable © in (4.15) accounts for 
uncertainties experienced in the course of signal transmission across the communication 
channel. 

The autocorrelation function of X(t) is 
R xx( f > = E t x ( r + r)X{t)] 

= E[A 2 cos(27t/ c f + 2nf c T+ 0) cos(2nf c t + ©)] 

2 2 
= yE[cos(47t/ c f + 27t/ c r+20)] + y E[cos(27i/ c r)] 

2 K 2 

= cos(47t/ c r + 2nf c z+ 28) d<9+^-cos(27t/ c r) 

—71 

The first term integrates to zero, so we simply have 

A 2 

K xx ( t) = — cos(2ti f c r) 

which is plotted in Figure 4.5. From this figure we see that the autocorrelation function of 
a sinusoidal wave with random phase is another sinusoid at the same frequency in the 
“local time domain” denoted by the time shift r rather than the global time domain 
denoted by t. 



Autocorrelation function of a sine wave with random phase. 
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Random Binary Wave 

Figure 4.6 shows the sample function x(t ) of a weakly stationary process X(t) consisting of 
a random sequence of binary symbols 1 and 0. Three assumptions are made: 

The symbols 1 and 0 are represented by pulses of amplitude +A and -A volts 
respectively and duration T seconds. 

The pulses are not synchronized, so the starting time f d of the first complete pulse 
for positive time is equally likely to lie anywhere between zero and T seconds. That 
is, t d is the sample value of a uniformly distributed random variable T d , whose 
probability density function is defined by 


frS* d) - ' 


1 

r 

o. 


0 < t d < T 
elsewhere 


During any time interval (n - 1 )T < t — t d < nT, where n is a positive integer, the 
presence of a 1 or a 0 is determined by tossing a fair coin. Specifically, if the 
outcome is heads, we have a 1; if the outcome is tails, we have a 0. These two 
symbols are thus equally likely, and the presence of a 1 or 0 in any one interval is 
independent of all other intervals. 

Since the amplitude levels -A and +A occur with equal probability, it follows immediately 
that E[X(f)] = 0 for all t and the mean of the process is therefore zero. 

To find the autocorrelation function we have to evaluate the expectation 

where X(t/J and X{tj) are random variables obtained by sampling the 
stochastic process X(t) at times and q respectively. To proceed further, we need to 
consider two distinct conditions: 

Vk ~ U\ > T 

Under this condition, the random variables X(t k ) and X(tj) occur in different pulse intervals 
and are therefore independent. We thus have 

E [X(t k )X( ti )] = E[X(^)]E[Z(r.)] = 0, I t k -t\>T 



Sample function of random binary wave. 
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| t k - tj\ > T, with t k = 0 and f ; - < t k 

Under this second condition, we observe from Figure 4.6 that the random variables X(t k ) 
and X(tj) occur in the same pulse interval if, and only if, the delay f d satisfies the condition 
t d < T - \t k - tj\. We thus have the conditional expectation 


E[X(r jt )Z(r.)|r d ] 


t d <T -\ t k~ t i\ 
0, elsewhere 


Averaging this result over all possible values of f d , we get 


E [X(t k )X( ti )] = f 

J o 

T 





= A 



< T 


By similar reasoning for any other value of t k , we conclude that the autocorrelation 
function of a random binary wave, represented by the sample function shown in Figure 
4.6, is only a function of the time difference r = t k - t r as shown by 


R xx<- r ) 


0 , 


\t\<T 

\t\>T 


This triangular result, described in (4.18), is plotted in Figure 4.7. 



Autocorrelation function of random binary wave. 


Consider next the more general case of two stochastic processes X(t) and Y(t) with 
autocorrelation functions M xx (t, u) and M Y y(t, u) respectively. There are two possible 
cross-correlation functions of X(t) and Y(t) to be considered. 
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Specifically, we have 

M xy (t, u) = E[X(f)F(w)] 

and 

My X (t,u ) = E [Y(t)X(u)] 

where t and u denote two values of the global time at which the processes are observed. 
All four correlation parameters of the two stochastic processes X(t') and Y(t ) may now be 
displayed conveniently in the form of the two-by-two matrix 


M(f, u) 


M xx (t, u ) M xy (t, u ) 
M yx (t, u) M YY {t, u) 


which is called the cross-correlation matrix of the stochastic processes X(t) and Y(t). If the 
stochastic processes X(t ) and Y(t) are each weakly stationary and, in addition, they are 
jointly stationary, then the correlation matrix can be expressed by 


R(r) = 


R xx ( r ) 

r yx( r ) 


r xy( t ^ 

R Yy( T ) 


where the time shift r = u - t. 

In general, the cross-correlation function is not an even function of the time-shift t as 
was true for the autocorrelation function, nor does it have a maximum at the origin. 
However, it does obey a certain symmetry relationship, described by 

r xy ^ t ) = r yx(~ t ) 


Quadrature-Modulated Processes 

Consider a pair of quadrature-modulated processes X\{t) and X 2 (t) that are respectively 
related to a weakly stationary process X(t) as follows: 

Xj(r) = X(t) cos(2jt/ c r + 0) 

X 2 (t ) = X(t) sin(2jt/ c f + 0) 

where f c is a carrier frequency and the random variable © is uniformly distributed over the 
interval [0, 27t], Moreover, 0 is independent of X(t). One cross-correlation function of 
Aj(f) and X 2 (t) is given by 

R 12 ( t ) = E[A ] (f)A 2 (;- r)] 

= E [X(t)X(t - t) cos(2n/ c r + 0) sin(2nf c t -2nf c r+ &)] 

= E[A(f)X(r- r)]E[cos(27t/ c r + 0) sin(2jt/ c r- 2nf c r+ 0)] 

= r)E[sm( 4 tt/ c r- 2tt/ c t + 20) - sin (2n/ c r)] 

= -\r X x( z ) sin W c r) 
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where, in the last line, we have made use of the uniform distribution of the random 
variable 0, representing phase. Invoking (4.22), we find that the other cross-correlation 
function of X ] (f) and X 2 (t) is given by 

^21 ( r ) = sin ( 2n f c T ) 

= sin(27t/ c r) 

At t= 0, the factor sin(2it/ c r) is zero, in which case we have 

fl 12 (0) = * 21 ( 0) = 0 

This result shows that the random variables obtained by simultaneously sampling the 
quadrature-modulated processes X^r) and X 2 (t) at some fixed value of time t are 
orthogonal to each other. 


Ergodic Processes 


Ergodic processes are subsets of weakly stationary processes. Most importantly, from a 
practical perspective, the property of ergoclicity permits us to substitute time averages for 
ensemble averages. 

To elaborate on these two succinct statements, we know that the expectations or 
ensemble averages of a stochastic process X(t) are averages “across the process.” For 
example, the mean of a stochastic process X(t) at some fixed time 1^ is the expectation of 
the random variable X(tf) that describes all possible values of sample functions of the 
process X(t ) sampled at time t = t ^ Naturally, we may also define long-term sample 
averages or time averages that are averages “along the process.” Whereas in ensemble 
averaging we consider a set of independent realizations of the process X(t ) sampled at 
some fixed time in time averaging we focus on a single waveform evolving across time 
t and representing one waveform realization of the process X(t). 

With time averages providing the basis of a practical method for possible estimation of 
ensemble averages of a stochastic process, we would like to explore the conditions under 
which this estimation is justifiable. To address this important issue, consider the sample 
function x(t) of a weakly stationary process X(t) observed over the interval —T<t<T. The 
time-average value of the sample function x(t) is defined by the definite integral 

V x (T) = x(t) d t 

Clearly, the time average /U X (T) is a random variable, as its value depends on the 
observation interval and which particular sample function of the process X(t) is picked for 
use in (4.24). Since the process X(t) is assumed to be weakly stationary, the mean of the 
time average /u x (T) is given by (after interchanging the operations of expectation and 
integration, which is permissible because both operations are linear) 


158 


Stochastic Processes 



- Ax 


where f.i x is the mean of the process X(t). Accordingly, the time average /u x (T) represents 
an unbiased estimate of the ensemble-averaged mean /t x . Most importantly, we say that 
the process X(t) is ergodic in the mean if two conditions are satisfied: 

The time average /u x (T) approaches the ensemble average /u x in the limit as the 
observation interval approaches infinity; that is. 


The variance of fi x {T), treated as a random variable, approaches zero in the limit as 
the observation interval approaches infinity; that is, 


The other time average of particular interest is the autocorrelation function R a (z, T), 
defined in terms of the sample function x(t) observed over the interval -T < t < I 
Following (4.24), we may formally define the time-averaged autocorrelation function of 
x(t) as 


This second time average should also be viewed as a random variable with a mean and 
variance of its own. In a manner similar to ergodicity of the mean, we say that the process 
x(t) is ergodic in the autocorrelation function if the following two limiting conditions are 
satisfied: 


With the property of ergodicity confined to the mean and autocorrelation functions, it 
follows that ergodic processes are subsets of weakly stationary processes. In other words, 
all ergodic processes are weakly stationary; however, the converse is not necessarily true. 

Transmission of a Weakly Stationary Process through a 
Linear Time-invariant Filter 


Suppose that a stochastic process X(t ) is applied as input to a linear time-invariant filter of 
impulse response h(t), producing a new stochastic process Y(t) at the filter output, as 
depicted in Figure 4.8. In general, it is difficult to describe the probability distribution of 
the output stochastic process Y(t), even when the probability distribution of the input 
stochastic process X(t) is completely specified for the entire time interval -oo <t< oo . 


lim juJT) = n x 


lim var[// T (T)] = 0 



hm RJ z, T) = R xx (t) 

T — > oo 


lim var [R xx { z, T)] = 0 


Transmission of a Weakly Stationary Process through a Linear Time-invariant Filter 
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x(t) 



r(t ) 


Transmission of a 
stochastic process through a 
linear time-invariant filter. 


For the sake of mathematical tractability, we limit the discussion in this section to the 
time-domain form of the input-output relations of the filter for defining the mean and 
autocorrelation functions of the output stochastic process Y(t) in terms of those of the 
input X(t), assuming that X(t) is a weakly stationary process. 

The transmission of a process through a linear time-invariant filter is governed by the 
convolution integral, which was discussed in Chapter 2. For the problem at hand, we may 
thus express the output stochastic process Y(t) in terms of the input stochastic process X(t) as 


r 00 

no = J 

—00 


h{T\)X(t - fj) dfj 


where Zj is a local time. Hence, the mean of Y(t) is 
MyU) = E[7(0] 


= E[J h{T { )X{t- z-j) dr. 

Provided that the expectation E[X(r)] is finite for all t and the filter is stable, we may 
interchange the order of expectation and integration in (4.27), in which case we obtain 


My( 0 


h( z‘ 1 )E[X(z- z-j)] dz-j 
K^\)Mx^~ z-j) dz-j 


—00 

When the input stochastic process X(t) is weakly stationary, the mean t) is a constant 
/j x ; therefore, we may simplify (4.28) as 


My = V x ) h ( T 0 dT i 

—00 

= Mx h ( 0 ) 

where H( 0) is the zero-frequency response of the system. Equation (4.29) states: 


This result is intuitively satisfying. 
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Consider next the autocorrelation function of the output stochastic process Y{t). By 
definition, we have 

Myy(t, u) = E[7(f)F(M)] 

where t and it denote two values of the time at which the output process Y(t) is sampled. 
We may therefore apply the convolution integral twice to write 


r * * 

Myy(t,u) = E I /i(rj)Z(f- rj) dz-j I h( r-,)X(u - r 2 ) dr-, 

L " —00 —00 


Here again, provided that the mean-square value E[X 2 (f)] is finite for all t and the filter is 
stable, we may interchange the order of the expectation and the integrations with respect 
to Ti and r 2 in (4.30), obtaining 


M 


<X> r- 00 

YY (t,u ) = j /t(fi)[ dr 2 A(r 2 )E[X(f- r x )X(u - r 2 )] 


dr, 


r 00 T r°° 

h( Tj ) I dr 2 h( T^)M xx (t- t \- u ~ 


dr, 


When the input X(t ) is a weakly stationary process, the autocorrelation function of X(t) is 
only a function of the difference between the sampling times t - T\ and u - r->. Thus, 
putting r = u - t in (4.31), we may go on to write 


CO CO 

^yy(c) = K rj)/l( T 2 )R X x( t+ t \ ~ r 2 ) ^ r l ^ r 2 

-00 -00 


which depends only on the time difference r. 

On combining the result of (4.32) with that involving the mean /Uy in (4.29), we may 
now make the following statement: 


By definition, we have Ryy(O) = E| Y 2 (t)]. In light of Property 1 of the autocorrelation 
function Ryy(r), it follows, therefore, that the mean-square value of the output process 
Y{t) is obtained by putting r = 0 in (4.32), as shown by 


2 f ® 

E[F (t)] = J J h( rj)/i( r 2 )/? XJf ( r l - r 2 ) dr t dr 2 


which, of course, is a constant. 


Power Spectral Density of a Weakly Stationary Process 


Thus far we have considered the time-domain characterization of a weakly stationary 
process applied to a linear filter. We next study the characterization of linearly filtered 
weakly stationary processes by using frequency-domain ideas. In particular, we wish to 
derive the frequency-domain equivalent to the result of (4.33), defining the mean-square 
value of the filter output Y(t). The term “filter” used here should be viewed in a generic 
sense; for example, it may represent the channel of a communication system. 


Power Spectral Density of a Weakly Stationary Process 
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From Chapter 2, we recall that the impulse response of a linear time-invariant filter is 
equal to the inverse Fourier transform of the frequency response of the filter. Using H(f) to 
denote the frequency response of the filter, we may thus write 


r ® 

M*i) = j 

—00 


H(f) exp(j2ji/Tj) d f 


Substituting this expression for /z( r^) into (4.33) and then changing the order of 
integrations, we get the triple integral 


E[r(r)] 


—00 —00 L - 


H(f) exp (j 27 i/rj) d If 


h( t 2 )R xx ( fj - t 2 ) drj dr 2 


00 p 00 00 

J //(/)J d r 2 /i( r 2 )J /^(r, - r 2 )exp(j27i/r 1 ) dr. 


d f 


At first, the expression on the right-hand side of (4.35) looks rather overwhelming. 
However, we may simplify it considerably by first introducing the variable 


Then, we may rewrite (4.35) in the new form 

00 

h ( r 2 ) exp ( j 2 7t/r 2 ) dr 2 

—00 

The middle integral involving the variable Ti inside the square brackets on the right-hand 
side in (4.36) is simply H*(f), the complex conjugate of the frequency response of the 
filter. Hence, using |//(/)|" = where \H(f)\ is the magnitude response of the 

filter, we may simplify (4.36) as 


E[F 2 (0] = f 

—00 L J 


00 

J r xx( t ) exp ^-j 2 71/ r) dr 


d f 


E[T“(t)] = J |ff(/)r[f % x (r) exp(-j27t/r) dr 

—00 L J —00 


d f 


We may further simplify (4.37) by recognizing that the integral inside the square brackets 
in this equation with respect to the variable r is simply the Fourier transform of the 
autocorrelation function Rxx( T ) °f the input process X(t). In particular, we may now 
define a new function 


f °° 

s xxtF> = J R xx( f ) exp(-j27t ft) dr 


The new function Sxx(f) is called the power spectral density, or power spectrum, of the 
weakly stationary process X(t). Thus, substituting (4.38) into (4.37), we obtain the simple 
formula 


_ 00 

E[nO] = J \H(f)\ S xx (f) d f 

—00 

which is the desired frequency-domain equivalent to the time-domain relation of (4.33). In 
words, (4.39) states: 
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To investigate the physical significance of the power spectral density, suppose that the 
weakly stationary process X(t) is passed through an ideal narrowband filter with a 
magnitude response \H(f)\ centered about the frequency / c , depicted in Figure 4.9; we may 
thus write 




1 , 


= i 


0 , 


i/±/<H a / 

|/±/c| > \ A f 


where A/is the bandwidth of the filter. From (4.39) we readily find that if the bandwidth A/ 
is made sufficiently small compared with the midband frequency / c of the filter and S^xif) 
is a continuous function of the frequency/, then the mean-square value of the filter output 
is approximately given by 

E[Y 2 (t)]*(2Af)S xx (f) for all / 

where, for the sake of generality, we have used / in place of f c . According to (4.41), 
however, the filter passes only those frequency components of the input random process 
X(t ) that lie inside the narrow frequency band of width Af. We may, therefore, say that 
S x (f) represents the density of the average power in the weakly stationary process X(t), 
evaluated at the frequency/. The power spectral density is therefore measured in watts pet- 
hertz (W/Hz). 


According to (4.38), the power spectral density S xx (f> °f a weakly stationary process X(t) 
is the Fourier transform of its autocorrelation function R^x^r). Building on what we know 
about Fourier theory from Chapter 2, we may go on to say that the autocorrelation 
function Rxx( T ) i s the inverse Fourier transform of the power spectral density S xx (f). 



Magnitude response of ideal narrowband filter. 
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Simply put, R xx (r) and Sxxff) form a Fourier-transform pair, as shown by the following 
pair of related equations: 


°° 

S xx (f) = J R xx ( r ) ex P(-j2 nff) dr 



s xxU) ex P ( j 2 it/ r) d f 


These two equations are known as the Wiener-Khint chine relations, which play a 
fundamental role in the spectral analysis of weakly stationary processes. 

The Wiener-Khintchine relations show that if either the autocorrelation function or 
power spectral density of a weakly stationary process is known, then the other can be 
found exactly. Naturally, these functions display different aspects of correlation-related 
information about the process. Nevertheless, it is commonly accepted that, for practical 
purposes, the power spectral density is the more useful function of the two for reasons that 
will become apparent as we progress forward in this chapter and the rest of the book. 


Zero Correlation among Frequency Components 

The individual frequency components of the power spectral density S xx (f) of a weakly 
stationary process X(t) are uncorrelated with each other. 

To justify this property, consider Figure 4.10, which shows two adjacent narrow bands 
of the power spectral density Sxxff) ^ with the width of each band being denoted by A/. 
From this figure, we see that there is no overlap, and therefore no correlation, between the 
contents of these two bands. As A f approaches zero, the two narrow bands will 
correspondingly evolve into two adjacent frequency components of Sxxff), remaining 
uncorrelated with each other. This important property of the power spectral density Sxxff) 
is attributed to the weak stationarity assumption of the stochastic process X(t). 


Sxxtf) 



Illustration of zero correlation between two adjacent narrow 
bands of an example power spectral density. 
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Zero-frequency Value of Power Spectral Density 

The zero-frequency value of the power spectral density of a weakly stationary process 
equals the total area under the graph of the autocorrelation function; that is, 


“ j R XX^ r ) dr 


This second property follows directly from (4.42) by putting/= 0. 


Mean-square Value of Stationary Process 

The mean-square value of a weakly stationary process X(t) equals the total area under the 
graph of the power spectral density of the process; that is, 

E[* 2 (0] = f s xx (f) d f 

—00 

This third property follows directly from (4.43) by putting r = 0 and using Property 1 of 
the autocorrelation function described in (4.1 1) namely R x ( 0) = E[X 2 (f)] for all t. 

Nonnegativeness of Power Spectral Density 

The power spectral density of a stationary process X( t) is always nonnegative; that is, 

S xx (f)> 0 for all/ 

This property is an immediate consequence of the fact that, since the mean-square 

value E[T 2 (f)] is always nonnegative in accordance with (4.41), it follows that 
2 

S x ^(/) « E[T (f)]/(2A/) must also be nonnegative. 


Symmetry 

The power spectral density of a real-valued weakly stationary process is an even function 
of frequency; that is, 

Sxx(-f) = S xx (f) 

This property is readily obtained by first substituting -f for the variable/in (4.42): 


hx(~f) = f 


R XX ( T ) ex p(j2it ;/r) dr 


Next, substituting -r for r, and recognizing that R xx (-t) = R xx (t) in accordance with 
Property 2 of the autocorrelation function described in (4.12), we get 


r 00 

S X x(~f) = j R xx( r ) ex PHWr) dr = S xx (f) 

—00 

which is the desired result. It follows, therefore, that the graph of the power spectral 
density 5xx(/), plotted versus frequency/, is symmetric about the origin. 


Power Spectral Density of a Weakly Stationary Process 
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Normalization 

The power spectral density, appropriately normalized, has the properties associated with 
a probability density function in probability theory. 

The normalization we have in mind here is with respect to the total area under the graph 
of the power spectral density (i.e., the mean-square value of the process). Consider then 
the function 

Sxxd) 

f S xx (f) d f 

—00 

In light of Properties 3 and 4, we note that Pxxif) > 0 for all f. Moreover, the total area 
under the function Px^f) is unity. Hence, the normalized power spectral density, as 
defined in (4.48), behaves in a manner similar to a probability density function. 

Building on Property 6, we may go on to define the spectral distribution function of a 
weakly stationary process X(t) as 

,/ 

F xx(f> = ) Pxxi v ') dv ' 

—00 

which has the following properties: 

F xx(~ co ^ = 0 
F xx ( co ) = 1 

F xxif) ' s a nondecreasing function of the frequency/. 

Conversely, we may state that every nondecreasing and bounded function F^xif) is the 
spectral distribution function of a weakly stationary process. 

Just as important, we may also state that the spectral distribution function F xx (f) has all 
the properties of the cumulative distribution function in probability theory, discussed in 
Chapter 3. 



Sinusoidal Wave with Random Phase (continued) 

Consider the stochastic process X(t ) = Acos(2jt/ c f + 0), where © is a uniformly 
distributed random variable over the interval [— tc, ji]. The autocorrelation function of this 
stochastic process is given by (4.17), which is reproduced here for convenience: 

A 2 

R xx ( T ) = ycos(2ii/ c r) 

Let 5(f) denote the delta function at/= 0. Taking the Fourier transform of both sides of the 
formula defining R xx (t), we find that the power spectral density of the sinusoidal process 
X(t ) is 

A 2 

s xx (f) = -f;[S(f-f c ) + 5(f + f c )] 

which consists of a pair of delta functions weighted by the factor A 2 ! A and located at ±/ c , 
as illustrated in Figure 4.11. Since the total area under a delta function is one, it follows 
that the total area under Sxxff) is equal to A~! 2, as expected. 
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S xx(f~> 


A 2 


A 2 

4«/+/c) - 

V 

> 



-/c 0 /c 

Power spectral density of sine wave with 


random phase; 8{f) denotes the delta function at/= 0. 


Random Binary Wave (continued) 

Consider again a random binary wave consisting of a sequence of Is and Os represented by 
the values +A and -A respectively. In Example 3 we showed that the autocorrelation 
function of this random process has the triangular form 


R xx _ ' 


A 



. 0, \t\>T 


The power spectral density of the process is therefore 

T 

s xx(f) = j exp(-j27t/r) dr 

Using the Fourier transform of a triangular function (see Table 2.2 of Chapter 2), we 
obtain 

S xx (f) = A 2 rsinc 2 (/T) 

which is plotted in Figure 4.12. Here again we see that the power spectral density is non- 
negative for all /and that it is an even function off. Noting that Rxx( 0) = A and using 



Power spectral density of random binary wave. 
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Property 2 of power spectral density, we find that the total area under S xx (f), or the aver- 
age power of the random binary wave described here, is A 2 , which is intuitively satisfying. 


It is informative to generalize (4.51) so that it assumes a more broadly applicable form. 
With this objective in mind, we first note that the energy spectral density (i.e., the squared 
magnitude of the Fourier transform) of a rectangular pulse g(t ) of amplitude A and 
duration T is given by 

E g if) = A 2 T 2 sine 2 (/T) 


We may therefore express (4.51) in terms of E g (f) simply as 


s xx(f) 


EJf) 

T 


In words, (4.53) states: 


Mixing of a Random Process with a Sinusoidal Process 

A situation that often arises in practice is that of mixing (i.e., multiplication) of a weakly 
stationary process X(t) with a sinusoidal wave cos(2ji f c t + 0), where the phase 0 is a 
random variable that is uniformly distributed over the interval [0, 2n]. The addition of the 
random phase 0 in this manner merely recognizes the fact that the time origin is arbitrarily 
chosen when both X(t ) and cos(2it/ c f + 0) come from physically independent sources, as is 
usually the case in practice. We are interested in determining the power spectral density of 
the stochastic process 

Y(t) = X(t) cos(2?t/ c t + 0) 

Using the definition of autocorrelation function of a weakly stationary process and noting 
that the random variable © is independent of X(t), we find that the autocorrelation function 
of the process Y(t) is given by 

tfyy(r) = E[T(f+ r)F(r)] 

= E[X(f+ r) cos(2ii/ c r + 2nf c r+ ®)X(t) cos(2jt/ c r + 0)] 

= E[X(r+ r)x(r)]E[cos(27t/ c r + 2nf c r+ 0) cos(2n/ c t + 0)] 

= ^R xx (r)E[cos(2nf c t) + cos(4n/ c f + 2nf c r+ 20)] 

= \ R xx ( T ") cos(2ti/ c 0 
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Since the power spectral density of a weakly stationary process is the Fourier transform of 
its autocorrelation function, we may go on to express the relationship between the power 
spectral densities of the processes X(t ) and Y(t) as follows: 

Syyif) = \lS xx (f-f c ) + S xx (f + f c )] 

Equation (4.56) teaches us that the power spectral density of the stochastic process Y(t) 
defined in (4.54) can be obtained as follows: 


Let Syyif) denote the power spectral density of the output stochastic processes Y(t) 
obtained by passing the weakly stationary process X(t) through a linear time-invariant 
filter of frequency response H(f). Then, by definition, recognizing that the power spectral 
density of a weakly stationary process is equal to the Fourier transform of its 
autocorrelation function and using (4.32), we obtain 


Syyif) = J RyyiT) exp (-j 271. fr) dr 

—00 

00 00 00 

= J J J h(j ] ) h (j 2 ) R X x( T+ T l - T i) exp(-j27r/r) dT i dl 2 dr 


-00 —00 —00 


Let r + T\ - ro = r () , or equivalently r = Tq - zq + Ti. By making this substitution into 
(4.57), we find that Syyif) may be expressed as the product of three terms: 

• the frequency response Hif) of the filter; 

• the complex conjugate of //(/); and 

• the power spectral density Sxxif) of the input process X(t). 

We may thus simplify (4.57) as shown by 

Syyif) = Hif) H* (f) S XX (J) 

Since \H(f)\ 2 = Hif)H*if , we finally find that the relationship among the power spectral 
densities of the input and output processes is expressed in the frequency domain by 

Syyif) = \Hif)\ 2 S xx f) 

Equation (4.59) states: 


By using (4.59), we can therefore determine the effect of passing a weakly stationary 
process through a stable, linear time-invariant filter. In computational terms, (4.59) is 
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obviously easier to handle than its time-domain counterpart of (4.32) that involves the 
autocorrelation function. 


At this point in the discussion, a basic question that comes to mind is the following: 


The answer to this question is embodied in a theorem that was first proved by Wiener 
(1930) and at a later date by Khintchine (1934). Formally, the Wiener-Kliintchine 
theorem states: 


A necessary and sufficient condition for p xx ( t ) to be the normalized autocorrelation 
function of a weakly stationary process X(t ) is that there exists a distribution 
function F xx (f) such that for all possible values of the time shift r, the function 
p xx ( T ) may be expressed in terms of the well-known Fourier-Stieltjes theorem , 
defined by 

exp (j2jt/Y) d F xx {f) 



The Wiener-Khintchine theorem described in (4.60) is of fundamental importance to a 
theoretical treatment of weakly stationary processes. 

Referring back to the definition of the spectral distribution function Fj^ff) given in 
(4.49), we may express the integrated spectrum d F^if) as 

dF xxW = Pxx(f ) d f 


which may be interpreted as the probability of X(t) contained in the frequency interval 
\f,f+df\. Hence, we may rewrite (4.60) in the equivalent form 

Pxxtf) exp(j27t/r) df 

which expresses p xx i r ) as the inverse Fourier transform of Pxx(f). At this point, we 
proceed by taking three steps: 



Substitute (4.14) for Pxx(t) on the left-hand side of (4.62). 

Substitute (4.48) for p xx (r) inside the integral on the right-hand side of (4.62). 
Use Property 3 of power spectral density in Section 4.7. 


The end result of these three steps is the reformulation of (4.62) as shown by 


R xx^ _ f* $xx(f> 

W(0) ~ -L/WO) 


ex P(j2jt ft) df 


Hence, canceling out the common term R^(0), we obtain 


r xx (t) = [ S xx {f) exp(j2ji/Y) df 

—00 
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which is a rewrite of (4.43). We may argue, therefore, that basically the two Wiener- 
Khintchine equations follow from either one of the following two approaches: 


Another Definition of the Power Spectral Density 


Equation (4.38) provides one definition of the power spectral density Sxxif) of a weakly 
stationary process X(t)\ that is, Sxxif) i s the Fourier transform of the autocorrelation 
function Rxxi T ) of the process X(t). We arrived at this definition by working on the mean- 
square value (i.e., average power) of the process Y(t) produced at the output of a linear 
time-invariant filter, driven by a weakly stationary process X(t). In this section, we provide 
another definition of the power spectral density by working on the process X(t) directly. 
The definition so developed is not only mathematically satisfying, but it also provides 
another way of interpreting the power spectral density. 

Consider, then, a stochastic process X(t ), which is known to be weakly stationary. Let 
x(t) represent a sample function of the process X(t). For the sample function to be Fourier 
transformable, it must be absolutely integrable; that is, 


This condition can never be satisfied by any sample function x(t ) of infinite duration. To 
get around this problem, we consider a truncated segment of x (t) defined over the 
observation interval —T< t < T, as illustrated in Figure 4.13, as shown by 


Clearly, the truncated signal Xj{ t) has finite energy; therefore, it is Fourier transformable. 
Let Xj(f) denote the Fourier transform of Xj( f), as shown by the transform pair: 


|x(f)| df < co 


—GO 



Xj{ t) ■*— Xj(f) 


x(t) 



t = -T 


t = T 


Illustration of the truncation of a sample x(t) for 


Fourier transformability; the actual function x(t) extends beyond 
the observation interval (-71 T) as shown by the dashed lines. 
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in light of which we may invoke Rayleigh’s energy theorem (Property 14 in Table 2.1) to 
write 

00 „ 00 2 

J |*J<0|" dr = J \X T (,f)\ d f 

—00 —00 

Since (4.64) implies that 

r°° 2 2 

J \xj{t)\ dt = J \x(t)\ d t 

—oo —T 

we may also apply Rayleigh’s energy theorem to the problem at hand as follows: 
f \x(t)\ 2 dt = f \X T (f)[ d f 

—T —oo 


With the two sides of (4.65) based on a single realization of the process X(t), they are both 
subject to numerical variability (i.e., instability) as we go from one sample function of the 
process X(t) to another. To mitigate this difficulty, we take the ensemble average of (4.65), 
and thus write 


E 




(f \ X M' 


d f 


What we have in (4.66) are two energy-based quantities. However, in the weakly 
stationary process X(t), we have a process with some finite power. To put matters right, we 
multiply both sides of (4.66) by the scaling factor 1/(27) and take the limiting form of the 
equation as the observation interval T approaches infinity. In so doing, we obtain 


lim 

r — » oo 



The quantity on the left-hand side of (4.67) is now recognized as the average power of the 
process X(t), denoted by P av , which applies to all possible sample functions of the process 
X(t). We may therefore recast (4.67) in the equivalent form 


P 


av 



d f 


In (4.68), we next recognize that there are two mathematical operations of fundamental 
interest: 


These two operations, viewed in a composite manner, result in a statistically stable 
quantity defined by P av . Therefore, it is permissible for us to interchange the order of the 
two operations on the right-hand side of (4.68), recasting this equation in the desired form: 


P 


av 



lim E 

T — > oo 



d f 
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With (4.69) at hand, we are now ready to formulate another definition for the power 
spectral density as 


Sxxif) = 


lim E 

r — » oo 


\x T lf)\ 2 - 

2 T 


This new definition has the following interpretation: 


This new interpretation of the power spectral density is all the more satisfying when (4.70) 
is substituted into (4.68), yielding 

/J av = f S xx (f ) df 

—00 

which is immediately recognized as another way of describing Property 3 of the power 
spectral density (i.e., (4.45). End-of-chapter Problem 4.8 invites the reader to prove other 
properties of the power spectral density, using the definition of (4.70). 

One last comment must be carefully noted: in the definition of the power spectral 
density given in (4.70), it is not permissible to let the observation interval T approach 
infinity before taking the expectation; in other words, these two operations are not 
commutative. 


Cross-spectral Densities 


Just as the power spectral density provides a measure of the frequency distribution of a 
single weakly stationary process, cross-spectral densities provide measures of the 
frequency interrelationships between two such processes. To be specific, let X(t ) and Y(t) 
be two jointly weakly stationary processes with their cross-correlation functions denoted 
by R xy (t) and Ry X ( T )- We define the corresponding cross-spectral densities S XY (f) and 
S YX (f) °f this pair of processes to be the Fourier transforms of their respective cross- 
correlation functions, as shown by 


and 


Sxvif) = f 

— o 

Syx(f) = f 


R xy (t) exp(-j2n fr) dr 


R yx (t) exp(-j2 nfr) dr 


The cross-correlation functions and cross-spectral densities form Fourier-transform pairs. 
Accordingly, using the formula for inverse Fourier transformation, we may also 
respectively write 



S XY (f) exp(j2 nfr) df 
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and 



S Y x (f) exp(j2n/r) d f 


The cross-spectral densities S X yif) and S vxif) are not necessarily real functions of the 
frequency /. However, substituting the following relationship (i.e.. Property 2 of the 
autocorrelation function) 

^xy( r ) = ^yx( — r ) 

into (4.72) and then using (4.73), we find that S X y(f) and Syxif) are related as follows: 


S XY if) = S Yx(~f) = S Yx(f ) 

where the asterisk denotes complex conjugation. 


Sum of Two Weakly Stationary Processes 

Suppose that the stochastic processes X(t) and Y(t) have zero mean and let their sum be 
denoted by 

Z(t ) = X(t) + Y(t) 

The problem is to determine the power spectral density of the process Z(t). 

The autocorrelation function of Z(f) is given by the second-order moment 

M zz (t, u) = E[Z(t)Z(ii)] 

= E[(X(t)+Y(tmX(u) + Y(u))-\ 

= E[X(t)X(M)] + E[Z(f)T(«)] + E[F(r)X( M )] + E[T(0 T(m)] 

= M xx (t, U ) + M X y(t, U ) + My X (t, U ) + Myy(t, U ) 

Defining t = t -u and assuming the joint weakly stationarity of the two processes, we 
may go on to write 

= ^xx( T ^ + ^xy( t ) + ^yx^ t ^ + Ryy( t ) 

Accordingly, taking the Fourier transform of both sides of (4.77), we get 

= ^xx(f) + ^xY^f) + ^Yx(f) + ^yy(/) 

This equation shows that the cross-spectral densities S XY (f) and Syxif) represent the 
spectral components that must be added to the individual power spectral densities of a pair 
of correlated weakly stationary processes in order to obtain the power spectral density of 
their sum. 

When the stationary processes X{t) and Y(t) are uncorrelated, the cross-spectral 
densities S XY if) and Syxif) are zero, in which case (4.78) reduces to 

^zz^f) = ^xx(f) + ^yyC/) 

We may generalize this latter result by stating: 
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Filtering of Two Jointly Weakly Stationary Processes 

Consider next the problem of passing two jointly weakly stationary processes through a 
pair of separate, stable, linear time-invariant filters, as shown in Figure 4.14. The 
stochastic process X{t) is the input to the filter of impulse response h | (f), and the stochastic 
process Y(t) is the input to the filter of the impulse response hn(t). Let V(t) and Z(t) denote 
the processes at the respective filter outputs. The cross-correlation function of the output 
processes Vit) and Z(t) is therefore defined by the second-order moment 

M vz (t, u) = E [V(t)Z(u)] 

Tr” r°° 1 

= El /7j(rj)Z(f- rj) drj I h 2 ( t 2 )Y(u - r 2 ) dr, 


GO CO 

/ij( r 1 )/i ? ( r 2 )E[X(f- t x )Y(u - r-,)] dfj dr 2 

—GO —CO 

i * 00 

= J J h l (T l )h 2 (T 2 )M XY (t-T l ,u-T 2 )dT l dT 2 

—CO * —GO 

where M XY {t , u) is the cross-correlation function of X(t) and Y(t). Because the input 
stochastic processes are jointly weakly stationary, by hypothesis, we may set r=t - u, and 
thereby rewrite (4.80) as 



R 


00 00 

vz( T ) = j J *i( 7-i)/r 2 ( )R xy ( 7- z-j + r 2 ) dr, dr 2 


—00 —00 

Taking the Fourier transform of both sides of (4.81) and using a procedure similar to that 
which led to the development of (4.39), we finally get 


S vz (f) = H { (f)H* 2 (f)S XY (f) 

where H^(f) and lli(f) are the frequency responses of the respective filters in Figure 4.14 
and H*(f) is the complex conjugate of H 2 (f). This is the desired relationship between the 
cross-spectral density of the output processes and that of the input processes. Note that 
(4.82) includes (4.59) as a special case. 


A pair of separate linear 
time-invariant filters. 




The Poisson Process 


Flaving covered the basics of stochastic process theory, we now turn our attention to 
different kinds of stochastic processes that are commonly encountered in the study of 
communication systems. We begin the study with the Poisson process, which is the 
simplest process dealing with the issue of counting the number of occurrences of random 
events. 
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Sample function of a Poisson counting process. 


Consider, for example, a situation in which events occur at random instants of time, 
such that the average rate of events per second is equal to A. The sample path of such a 
random process is illustrated in Figure 4.15, where r,- denotes the occurrence time of the 
ith event with i= 1,2, .... Let N(t) be the number of event occurrences in the time interval 
[0, t]. As illustrated in Figure 4.15, we see that N(t) is a nondecreasing, integer-valued, 
continuous process. Let p k T denote the probability that exactly k events occur during an 
interval of duration r; that is, 

P k , T = P[N(t,t+ t) = k] 

With this background, we may now formally define the Poisson process: 


Time Homogeneity 

The probability p k r of k event occurrences is the same for all intervals of the same 
duration T. 

The essence of Property 1 is that the events are equally likely at all times. 

Distribution Function 

The number of event occurrences, Nq t in the in terval [0, t] has a distribution function with 
mean At, defined by 

P[N(t) = k] = exp (-At), k = 0, 1, 2, ... 

That is, the time between events is exponentially distributed. 

From Chapter 3, this distribution function is recognized to be the Poisson distribution. 
It is for this reason that N(t) is called the Poisson process. 
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Independence 

The numbers of events in nonoverlapping time intervals are statistically independent, 
regardless of how small or large the intervals happen to be and no matter how close or 
distant they could be. 

Property 3 is the most distinguishing property of the Poisson process. To illustrate the 
significance of this property, let [tpUj] for i = 1 , 2, .... k denote k disjoint intervals on the 
line [0, co]. We may then write 

k 

P[N(t v u j) = «, ',N(t 2 , u 2 ) = n 2 ;...;N(t k , uf) = t k \ = P[iV(t ; -, u { ) = nf] 

i = 1 

The important point to take from this discussion is that these three properties provide a 
complete characterization of the Poisson process. 

This kind of stochastic process arises, for example, in the statistical characterization of 
a special kind of noise called shot noise in electronic devices (e.g., diodes and transistors), 
which arises due to the discrete nature of current flow. 

The Gaussian Process 


The second stochastic process of interest is the Gaussian process, which builds on the 
Gaussian distribution discussed in Chapter 3. The Gaussian process is by far the most 
frequently encountered random process in the study of communication systems. We say so 
for two reasons: practical applicability and mathematical tractability. 

Let us suppose that we observe a stochastic process X(t) for an interval that starts at 
time t = 0 and lasts until t = T. Suppose also that we weight the process X(t) by some 
function g(t) and then integrate the product g(t)X(t ) over the observation interval [0, 7], 
thereby obtaining the random variable 

Y = [ g(t)X(t) d t 

J o 

We refer to Y as a linear functional of X(t). The distinction between a function and a 
functional should be carefully noted. For example, the sum Y = L • | a.X- , where the a t 
are constants and the Xj are random variables, is a linear function of the Xp for each 
observed set of values for the random variable Xj, we have a corresponding value for the 
random variable Y. On the other hand, the value of the random variable Y in (4.86) depends 
on the course of the integrand function g(t)X(t) over the entire observation interval from 0 
to T. Thus, a functional is a quantity that depends on the entire course of one or more 
functions rather than on a number of discrete variables. In other words, the domain of a 
functional is a space of admissible functions rather than a region of coordinate space. 

If, in (4.86), the weighting function g(t) is such that the mean-square value of the 
random variable Y is finite and if the random variable Y is a Gaussian-distributed random 
variable for every g(t) in this class of functions, then the process X(t) is said to be a 
Gaussian process. In words, we may state: 
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From Chapter 3 we recall that the random variable Y has a Gaussian distribution if its 
probability density function has the form 


f Y (y) = 


1 

J2na 


exp 


V 2a ) 


2 

where fu is the mean and a is the variance of the random variable Y. The distribution of a 
Gaussian process X(t), sampled at some fixed time tf,, say, satisfies (4.87). 

From a theoretical as well as practical perspective, a Gaussian process has two main 
virtues: 


The Gaussian process has many properties that make analytic results possible; we 
will discuss these properties later in the section. 

The stochastic processes produced by physical phenomena are often such that a 
Gaussian model is appropriate. Furthermore, the use of a Gaussian model to describe 
physical phenomena is often confirmed by experiments. Last, but by no means least, 
the central limit theorem (discussed in Chapter 3) provides mathematical justification 
for the Gaussian distribution. 


Thus, the frequent occurrence of physical phenomena for which a Gaussian model is 
appropriate and the ease with which a Gaussian process is handled mathematically make 
the Gaussian process very important in the study of communication systems. 


Linear Filtering 

If a Gaussian process X(t) is applied to a stable linear filter, then the stochastic process 
Y(t) developed at the output of the filter is also Gaussian. 

This property is readily derived by using the definition of a Gaussian process based on 
(4.86). Consider the situation depicted in Figure 4.8, where we have a linear time-invariant 
filter of impulse response h(t), with the stochastic process Xit) as input and the stochastic 
process Y(t) as output. We assume that Xit) is a Gaussian process. The process Yit) is 
related to X(t) by the convolution integral 

r T 

Y(t) = T h(t- f)X(f) dr, 0 < t < oo 

J o 

We assume that the impulse response hit) is such that the mean-square value of the output 
random process Y(t) is finite for all time t in the range 0 < t < co , for which the process 
Y{t) is defined. To demonstrate that the output process Y(t) is Gaussian, we must show that 
any linear functional of it is also a Gaussian random variable. That is, if we define the 
random variable 


= I SyW[f h ^~ r ) z ( r ) dr 

J o L J o 


df 


then Z must be a Gaussian random variable for every function gy(t), such that the mean- 
square value of Z is finite. The two operations performed in the right-hand side of (4.89) 
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are both linear; therefore, it is permissible to interchange the order of integrations, 
obtaining 


where the new function 


Z = 


r T 

[ g(t)X(r) At 

J o 


g(t) = g Y (t)h(t- t) At 

J 0 


Since X(t) is a Gaussian process by hypothesis, it follows from (4.91) that Z must also be a 
Gaussian random variable. We have thus shown that if the input X(t) to a linear filter is a 
Gaussian process, then the output Y(t) is also a Gaussian process. Note, however, that 
although our proof was carried out assuming a time-invariant linear filter, this property is 
also true for any arbitrary stable linear filter. 


Multivariate Distribution 


Consider the set of random variables X(t\), X(tf), ..., X(t n ), obtained by sampling a 
stochastic process X( t) at times f 1 , 1 2 , . . ., t n . If the process X( t) is Gaussian, then this set of 
random variables is jointly Gaussian for any n, with their n-fold joint probability density 
function being completely determined by specifying the set of means 

M x ( t .) = E [^( f ;)]> i = 1, 2, ..., n 

and the set of covariance functions 

tf) = E[(Z(t^) — Px( tj) ~I l x ( (,))]’ k, i = 1 5 2, ..., n 

Let the u-by-1 vector X denote the set of random variables X(t \ ), X(tf), ..., X(t n ) derived 
from the Gaussian process X(t) by sampling it at times f|, f 2 . . ., t n . Let the vector x denote 
a sample value of X. According to Property 2, the random vector X has a multivariate 
Gaussian distribution, defined in matrix form as 


fx(tf,X(t 2 ), ...,X(f„)V*l> x 2 ’ ■■■’ x n> 


,n/ 2 . 1/2 

(2ji) A 


exp 


-|(x-p) T S '(x-p) 


where the superscript T denotes matrix transposition, the mean vector 


\X=\/U l ,p 2 ,..„fUnV 


the covariance matrix 


*={c x (t k? t i )}l i = l 

Z 1 is the inverse of the covariance matrix E, and A is the determinant of the covariance 
matrix Z. 

Property 2 is frequently used as the definition of a Gaussian process. However, this 
definition is more difficult to use than that based on (4.86) for evaluating the effects of 
filtering on a Gaussian process. 

Note also that the covariance matrix Z is a symmetric nonnegative definite matrix. For a 
nondegenerate Gaussian process, Z is positive definite, in which case the covariance 
matrix is invertible. 
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Stationarity 

If a Gaussian process is weakly stationary, then the process is also strictly stationary. 

This follows directly from Property 2. 

Independence 

If the random variables Xitf), Xitf), ..., X(t n ), obtained by respectively sampling a 
Gaussian process X(t) at times fj, ^ •••> t n , are uncorrelated, that is 

E[(X(f^) — = 0 i^k 

then these random variables are statistically independent. 

The uncorrelatedness of X(t\), . .., X(t n ) means that the covariance matrix 2 is reduced 
to a diagonal matrix, as shown by 


2 = 



where the Os denote two sets of elements whose values are all zero, and the diagonal terms 
aj = E[X(r,.)-E[X(r.)]] 2 , / = 1,2, 

Under this special condition, the multivariate Gaussian distribution described in (4.94) 
simplifies to 


/xM = nw 

i = i 


where X t = X(tj) and 


fxi x i) 


Jin 


■exp 


7l<7; 


la- 


i = 1,2, n 


In words, if the Gaussian random variables X(tf), Xitf), ..., X(t n ) are uncorrelated, then 
they are statistically independent, which, in turn, means that the joint probability density 
function of this set of random variables is expressed as the product of the probability 
density functions of the individual random variables in the set. 


Noise 


The term noise is used customarily to designate unwanted signals that tend to disturb the 
transmission and processing of signals in communication systems, and over which we 
have incomplete control. In practice, we find that there are many potential sources of noise 
in a communication system. The sources of noise may be external to the system (e.g.. 
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atmospheric noise, galactic noise, man-made noise) or internal to the system. The second 
category includes an important type of noise that arises from the phenomenon of 
spontaneous fluctuations of current flow that is experienced in all electrical circuits. In a 
physical context, the most common examples of the spontaneous fluctuation phenomenon 
are shot noise , which, as stated in Section 4.10, arises because of the discrete nature of 
current flow in electronic devices; and thermal noise, which is attributed to the random 
motion of electrons in a conductor. However, insofar as the noise analysis of 
communication systems is concerned, be they analog or digital, the analysis is customarily 
based on a source of noise called white-noise, which is discussed next. 


This source of noise is idealized, in that its power spectral density is assumed to be 
constant and, therefore, independent of the operating frequency. The adjective “white” is 
used in the sense that white light contains equal amounts of all frequencies within the 
visible band of electromagnetic radiation. We may thus make the statement: 


Clearly, white-noise can only be meaningful as an abstract mathematical concept; we say 
so because a constant power spectral density corresponds to an unbounded spectral 
distribution function and, therefore, infinite average power, which is physically 
nonrealizable. Nevertheless, the utility of white-noise is justified in the study of 
communication theory by virtue of the fact that it is used to model channel noise at the 
front end of a receiver. Typically, the receiver includes a filter whose frequency response is 
essentially zero outside a frequency band of some finite value. Consequently, when white- 
noise is applied to the model of such a receiver, there is no need to describe how the power 
spectral density S^]y(f) falls off outside the usable frequency band of the receiver. 

Let 

s ww (f) = Y fora11 / 

as illustrated in Figure 4.16a. Since the autocorrelation function is the inverse Fourier 
transform of the power spectral density in accordance with the Wiener-Khintchine 
relations, it follows that for white-noise the autocorrelation function is 

N o 

^ww^ T ~) ~ 2 

Hence, the autocorrelation function of white noise consists of a delta function weighted by 
the factor N$/2 and occurring at the time shift r = 0, as shown in Figure 4. 16b. 

Since Rww(t) is zero for t # 0, it follows that any two different samples of white noise 
are uncorrelated no matter how closely together in time those two samples are taken. If the 
white noise is also Gaussian, then the two samples are statistically independent in 
accordance with Property 4 of the Gaussian process. In a sense, then, white Gaussian 
noise represents the ultimate in “randomness.” 

The utility of a white-noise process in the noise analysis of communication systems is 
parallel to that of an impulse function or delta function in the analysis of linear systems. 
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(a) (b) 

Characteristics of white-noise: (a) power spectral density; (b) autocorrelation function. 


Just as we may observe the effect of an impulse only after it has been passed through a 
linear system with a finite bandwidth, so it is with white noise whose effect is observed 
only after passing through a similar system. We may therefore state: 


Ideal Low-pass Filtered White Noise 

Suppose that a white Gaussian noise of zero mean and power spectral density N 0 /2 is 
applied to an ideal low-pass filter of bandwidth B and passband magnitude response of 
one. The power spectral density of the noise N(t) appearing at the filter output, as shown in 
Figure 4.17a, is therefore 


SjvjvC/) - ' 


2 ’ 

0, 


-B <f<B 

I/I > B 


Since the autocorrelation function is the inverse Fourier transform of the power spectral 
density, it follows that 

r B N o 

R N N O) = [ -yexp(j27i/z-) d / 

J -B z 

= NqB sine (2 Br) 

whose dependence on z is plotted in Figure 4.17b. From this figure, we see that R nn (t) 
has the maximum value NqB at the origin and it passes through zero at z=±k/(2B), where 
k= 1,2,3, .... 

Since the input noise Wit) is Gaussian (by hypothesis), it follows that the band-limited 
noise N(t) at the filter output is also Gaussian. Suppose, then, that Nil) is sampled at the 
rate of 2 B times per second. From Figure 4. 17b, we see that the resulting noise samples are 
uncorrelated and, being Gaussian, they are statistically independent. Accordingly, the joint 
probability density function of a set of noise samples obtained in this way is equal to the 
product of the individual probability density functions. Note that each such noise sample 
has a mean of zero and variance of NqB. 
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Characteristics of low-pass filtered white noise; (a) power spectral density; 
(b) autocorrelation function. 


Correlation of White Noise with Sinusoidal Wave 


Consider the sample function 

w'(t) = Jj.j w(t) cos(2jt/ c f) dr 

which is the output of a correlator with white Gaussian noise sample function wit) and 
sinusoidal wave J2/T cos(2n/ f) as its two inputs; the scaling factor Jl/T is included 
in (4.104) to make the sinusoidal wave input have unit energy over the interval 0 < t < T. 
With w{t) having zero mean, it immediately follows that the correlator output w'( t) has 
zero mean too. The variance of the correlator output is therefore defined by 


2 

a W 


= E 


T T 


|[ f wOj) cos(27i/ c f 1 )w(r 2 ) cos(27i/ c f 2 ) d t x dr 2 

7 * o-o 


2 r T r T 

= z, E cos (271 f c t x ) cos (271 / f 2 ) dfj dr 2 

7 * 0 * 0 

^r r r rA? o 

= - —S(t l -t 2 ) cos(27t/ c f 1 ) cos(27i/ c f 2 ) dfj dt 1 

7J 0 0 L 

where, in the last line, we made use of (4.101). We now invoke the sifting property of the 
delta function, namely 

f dr = g(0) 

—00 

where g(t ) is a continuous function of time that has the value g(0) at time t = 0. Hence, we 
may further simplify the expression for the noise variance as 

2 N o2r T 2 , 

ct w , = y-J cos (2nf c t) dr 

= 2 ^J [1 + cos(4rt/ c r)] dr 
= 

2 
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where, in the last line, it is assumed that the frequency f c of the sinusoidal wave input is an 
integer multiple of the reciprocal of T for mathematical convenience. 


Narrowband Noise 


The receiver of a communication system usually includes some provision for 
preprocessing the received signal. Typically, the preprocessing takes the form of a 
narrowband filter whose bandwidth is just large enough to pass the modulated component 
of the received signal essentially undistorted, so as to limit the effect of channel noise 
passing through the receiver. The noise process appearing at the output of such a filter is 
called narrowband noise. With the spectral components of narrowband noise concentrated 
about some midband frequency ±/ c as in Figure 4.18a, we find that a sample function nit) 
of such a process appears somewhat similar to a sine wave of frequency f c . The sample 
function n(r) may, therefore, undulate slowly in both amplitude and phase, as illustrated in 
Figure 4.18b. 

Consider, then, the n(t) produced at the output of a narrowband filter in response to the 
sample function wit ) of a white Gaussian noise process of zero mean and unit power spec- 
tral density applied to the fdter input; w(t) and n(t) are sample functions of the respective 
processes Wit) and Nit). Let H(f) denote the transfer function of this filter. Accordingly, 
we may express the power spectral density Sf f) of the noise N(t) in terms of /-/(/) as 

S NN (f) = \Hif )\ 2 

On the basis of this equation, we may now make the following statement: 


In this section we wish to represent the narrowband noise n{t) in terms of its in-phase and 
quadrature components in a manner similar to that described for a narrowband signal in 
Section 2.10. The derivation presented here is based on the idea of pre-envelope and related 
concepts, which were discussed in Chapter 2 on Fourier analysis of signals and systems. 




(a) Power spectral density of narrowband noise, (b) Sample function of 
narrowband noise. 
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Let n + (t ) and n(t) , respectively, denote the pre-envelope and complex envelope of the 
narrowband noise n(t). We assume that the power spectrum of n(t) is centered about the 
frequency f c . Then we may write 

n + (t) = n(t) + jn(f) 

and 

n(t) = n + (t) exp(-j2jt/ c t) 

where n(t ) is the Hilbert transform of n(t). The complex envelope n(t) may itself be 
expressed as 

n(t) = nj(0+jn Q (0 

Hence, combining (4.109) through (4.1 1 1), we find that the in-phase component «j(r) and 
the quadrature component «q( 0 of the narrowband noise nit) are 

Hj(t) = n(t ) cos(2?t/ c r) + n(t) sin(/2jt/ c f) 

and 

«q(r) = n(t) cos(2jt f c t)-n(t) sm(2nf c t) 

respectively. Eliminating n{t) between (4.112) and (4.113), we get the desired canonical 
form for representing the narrowband noise n{t), as shown by 

n(t ) = «j(f) cos(2?t/ c t) - Hg(r) sin(2jt/ c r) 

Using (4.112) to (4.114), we may now derive some important properties of the in-phase 
and quadrature components of a narrowband noise, as described next. 

The in-phase component «j(t) and quadrature component nq(t) of narrowband noise n(t) 
have zero mean. 

To prove this property, we first observe that the noise n(t) is obtained by passing nit) 
through a linear filter (i.e., Hilbert transformer). Accordingly, n(t ) will have zero mean 
because n(t) has zero mean by virtue of its narrowband nature. Furthermore, from (4.1 12) 
and (4.113), we see that «j(f) and «q(?) are weighted sums of n(t) and n(t) . It follows, 
therefore, that the in-phase and quadrature components, «j(f) and «q( 0, both have zero 
mean. 

If the narrowband noise n(t) is Gaussian, then its in-phase component n^ft) and quadra- 
ture component Ug(t) are jointly Gaussian. 

To prove this property, we observe that n(t) is derived from nit) by a linear filtering 
operation. Hence, if n(t) is Gaussian, the Hilbert transform n( t) is also Gaussian, and 
n(t) and n(t) are jointly Gaussian. It follows, therefore, that the in-phase and quadrature 
components, n\(t) and hqH), are jointly Gaussian, since they are weighted sums of jointly 
Gaussian processes. 

If the narrowband noise n(t) is weakly stationary, then its in-phase component «j(t) and 
quadrature component «q(0 are jointly weakly stationary. 

If n(t ) is weakly stationary, so is its Hilbert transform n(t). However, since the in-phase 
and quadrature components, «j(f) and /iy(r) , are both weighted sums of nit) and nit) 
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and the weighting functions, cos(2nf c t ) and sin(2n/ c t), vary with time, we cannot directly 
assert that «j(t) and n () ( t) are weakly stationary. To prove Property 3, we have to 
evaluate their correlation functions. 

Using (4.112) and (4.113), we find that the in-phase and quadrature components, «j(f) 
and Hg(f) , of a narrowband noise n ( f) have the same autocorrelation function, as shown by 

r a r^r) = R NqNq (t) = R nn (t ) cos (2nf c r) + R nn (t) sin(2nf c r) 
and their cross-correlation functions are given by 


R 


tVv Q ( r) - 


- r n q nS t ^ = R nn ( r ) si n(2nf c r) - R nn (t) cos(2ji/ c r) 


where R nn (t) is the autocorrelation function of n(t) , and R nn (t) is the Hilbert transform 
of R NN (f). From (4.115) and (4.116), we readily see that the correlation functions 
R n n (r) , R n n (r), and R N N (r) of the in-phase and quadrature components nft) and 
«q(T) depend only on the time shift r. This dependence, in conjunction with Property 1, 
proves that «j(t) and Hq (?) are weakly stationary if the original narrowband noise n(t) is 
weakly stationary. 


Both the in-phase noise «j(t) and quadrature noise nq(f) have the same power spectral 
density, which is related to the power spectral density S NN (f) of the original narrowband 
noise n(t ) as follows: 


- S N Q N Q (f> 


S NN (f~f c ) + S NN (f+f c ), -B </< B 
0, otherwise 


where it is assumed that S^iff) occupies the frequency interval f — B < \f \ <f + B and 


f c >B. 

To prove this fourth property, we take the Fourier transforms of both sides of (4.1 15), 
and use the fact that 


T[R N n( t )] = -}&gti(f)F[R NN (T)] 

= -jsgn (f)S NN (f) 


We thus obtain the result 
S N l N 1 ^f> = S N Q N Q (f> 

= {[S m (f-f c ) + S NN (f+f c )] 

-\[S NN (f-f c ) sgn (f-f c )-S NN {f+f c ) sgn(/+/ c )] 

= \s NN (f~f c )[ 1 - sgn (f-f c )] + \s NN (f+f c )[ 1 + sgn 

Now, with the power spectral density S^fff) of the original narrowband noise n(t) 
occupying the frequency interval f c ~B< \ f\ <f c + B , where / c > B, as illustrated in 
Figure 4.19, we find that the corresponding shapes of S^fff-f c ) and S NN (f + f c ) are as in 
Figures 4.19b and 4.19c respectively. Figures 4.19d, 4.19e, and 4. 1 9f show the shapes of 
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sgn (/), sgn (f -f c ), and sgn (f + f c ) respectively. Accordingly, we may make the following 
observation from Figure 4.19: 

For frequencies defined by -B </< B, we have 

sgn(/-/ c ) = - 1 

and 

sgn(/+/ c ) = +l 

Flence, substituting these results into (4.1 19), we obtain 
S N I N 1 ^ = S N Q N Q (f ) 

= S NN (f-f c ) + S m (f+f c ), - B </< B 




(d) 


sgn(/) 
+ 1 


-1 


/ 


(a) Power spectral density SjvM/) 
pertaining to narrowband noise n{t). 

(b) , (c) Frequency-shifted versions 
of S^(f) in opposite directions. 

(d) Signum function sgn (/). 

(e) , (f ) Frequency-shifted versions 
of sgn (/) in opposite directions. 


sgn (f-fc) 
(e) 


0 






-1 


(0 

sgn (f+f c ) 

+1 




-fc 

0 




Narrowband Noise 


187 


For 2/ c - B </< 2/ c + B, we have 

sgn(/-/c) = 1 

and 

sgn(/+/ c ) = 0 

with the result that S N N (/) and S N N if) are both zero. 

For -2/ c - B <f< -2 f c + B, we have 

sgn (f~f c ) = 0 

and 

sgn(/+/c) = -1 

with the result that, here also, S N N if) and S N N if) are both zero. 

Outside the frequency intervals defined in points 1, 2, and 3, both S^(f — f c ) and 

SnM + /c) are zero > and in a corresponding way, S^iff-f c ) and S N N if) are also 

zero. 

Combining these results, we obtain the simple relationship defined in (4.1 17). 

As a consequence of this property, we may extract the in-phase component «j(f) and 
quadrature component «g(f) , except for scaling factors, from the narrowband noise nit) 
by using the scheme shown in Figure 4.20a, where both low-pass filters have a cutoff 
frequency at B. The scheme shown in Figure 4.20a may be viewed as an analyzer. Given 
the in-phase component and the quadrature component /i ( j( t) , we may generate the 
narrowband noise n(t) using the scheme shown in Figure 4.20b, which may be viewed as a 
synthesizer. 

The in-phase and quadrature components ;ij(t) and iiQ(t) have the same variance as the 
narrowband noise n(t). 

This property follows directly from (4.1 17), according to which the total area under the 
power spectral density curve n^( t) or «q( 0 is the same as the total area under the power 
spectral density curve of nit). Hence, «j(f) and Wg(f) have the same mean-square value 
as n(t'). Earlier we showed that since n(t) has zero mean, then n , ( t) and n () ( t) have zero 
mean, too. It follows, therefore, that «j(f) and «g(f) have the same variance as the 
narrowband noise n(t). 



<*> 


-|«q« 



(a) (b) 

(a) Extraction of in-phase and quadrature components of a narrowband process, 
(b) Generation of a narrowband process from its in-phase and quadrature components. 
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The cross-spectral densities of the in-phase and quadrature components of a narrowband 
noise are purely imaginary, as shown by 

5 WjJV q (/) = - S N (i Njf) 

fj [S N (f+f c )-S N (f-f c )], -B <f<B 

[ 0, otherwise 

To prove this property, we take the Fourier transforms of both sides of (4. 1 1 6), and use the 
relation of (4. 1 18), obtaining 

S N l N Q (/) = ~ s n q n^ 

= - l 2 [S NN (f-f c )-S NN (f+f c )] 

+ 2 [S NN (f—f c ) s gn(/-/ c ) + S NN (f+f c ) sgn(/+/ c )] 

= %S NN (f+f c )[l + sgn (f+f c )] - } -S NN (f-f c )[ 1 - sgn(/-/ c )] 

Following a procedure similar to that described for proving Property 4, we find that 
(4.121) reduces to the form shown in (4.120). 


If a narrowband noise n(t) is Gaussian with zero mean and a power spectral density 
SNN(f) that is locally symmetric about the midband frequency ±/ c , then the in-phase noise 
nf t) and the quadrature noise «Q(f) are statistically independent. 

To prove this property, we observe that if S NN (f) is locally symmetric about ±/ c , then 
■W/-/c) = S NN (.f+f c ), -B <f<B 

Consequently, we find from (4.120) that the cross-spectral densities of the in-phase and 
quadrature components, nf t) and n () ( t) , are zero for all frequencies. This, in turn, means 
that the cross-correlation functions S N N ^(J) and S N N (f) are zero for all r, as shown by 

E[^l(^+ t)N q (t k + r)] = 0 

which implies that the random variables /V j ( r^. + r) and NqU^) (obtained by observing the 
in-phase component at time + r and observing the quadrature component at time 
respectively) are orthogonal for all r. 

The narrowband noise n(t) is assumed to be Gaussian with zero mean; hence, from 
Properties 1 and 2 it follows that both N\{t k + r) and N^tf) are also Gaussian with zero 
mean. We thus conclude that because + r) and NQ{tf) are orthogonal and have zero 
mean, they are uncorrelated, and being Gaussian, they are statistically independent for all 
t. In other words, the in-phase component «j(f) and the quadrature component «q( t) are 
statistically independent. 

In light of Property 7, we may express the joint probability density function of the 
random variables Nf,t k + r) and N^tf) (for any time shift r) as the product of their 
individual probability density functions, as shown by 
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2 

where cr is the variance of the original narrowband noise n(t). Equation (4.124) holds if, 
and only if, the spectral density S^(f) or ii(t) is locally symmetric about ±/ c . Otherwise, 
this relation holds only for r = 0 or those values of r for which n^it) and «q( r) are 
uncorrelated. 


To sum up, if the narrowband noise n(t) is zero mean, weakly stationary, and Gaussian, 
then its in-phase and quadrature components « j(?) and Hq( t ) are both zero mean, jointly 
stationary, and jointly Gaussian. To evaluate the power spectral density of «j(f) or n () ( t) , 
we may proceed as follows: 

Shift the positive frequency portion of the power spectral density S NN (f) of the 
original narrowband noise n(t) to the left by/ c . 

Shift the negative frequency portion of S NN (f) to the right by/ c . 

Add these two shifted spectra to obtain the desired S N N (/) or S N N (f) . 


Ideal Band-pass Filtered White Noise 

Consider a white Gaussian noise of zero mean and power spectral density Nq/2, which is 
passed through an ideal band-pass filter of passband magnitude response equal to one, 
midband frequency / c , and bandwidth 2 B. The power spectral density characteristic of the 
filtered noise n(t) is, therefore, as shown in Figure 4.21a. The problem is to determine the 
autocorrelation functions of n{t) and those of its in-phase and quadrature components. 

The autocorrelation function of nit) is the inverse Fourier transform of the power 
spectral density characteristic shown in Figure 4.21a, as shown by 

f -/c+5 N f c+ B N 

r Nn( t ') = [ T ex P(j27t/r) d/+ [ -2 exp(j27t/Y) d f 

-fo~B 1 J f c -B 2 

= N q B sinc(25r)[exp(-j27t/ c r)+ exp(j27t/ c r)] 

= 2N q B sinc(2 Br) cos(27t/ c r) 
which is plotted in Figure 4.21b. 

The spectral density characteristic of Figure 4.21a is symmetric about ±/ c . The 
corresponding spectral density characteristics of the in-phase noise component n^{i) and 
the quadrature noise component «q( 0 are equal, as shown in Figure 4.21c. Scaling the 
result of Example 10 by a factor of two in accordance with the spectral characteristics of 
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Figure 4.21a and 4.21c, we find that the autocorrelation function of «,( t) or n () ( t) is 
given by 


R N l N^ T ) = %tV Q ( r ) = 2NqB sinc(2Z?r) 



s i vat(/) 

N 0 

2 





i 


-/c o 

fc 

- — 26 — *- 




(b) 



Characteristics of ideal band-pass filtered white noise: (a) power 
spectral density, (b) autocorrelation function, (c) power spectral density of in-phase 
and quadrature components. 
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In the preceding subsection we used the Cartesian representation of a narrowband noise 
n(t) in terms of its in-phase and quadrature components. In this subsection we use the 
polar representation of the noise n(t) in terms of its envelope and phase components, as 
shown by 


The function r(t) is the envelope of n(t) and the function t//( t) is the phase of n(t). 

The probability density functions of r(t) and t//(f) may be obtained from those of 
ftj(f) and Hq (f) as follows. Let ;V, and /Vq denote the random variables obtained by 
sampling (at some fixed time) the stochastic processes represented by the sample 
functions «j(t) and /Jq( t) respectively. We note that /Vj and Nq are independent Gaussian 
random variables of zero mean and variance cr“, so we may express their joint probability 
density function as 


Accordingly, the probability of the joint event that A] lies between rq and ;q + d/q and Nq 
lies between Hq + d«Q (i.e., the pair of random variables Aj and /Vq lies jointly inside the 
shaded area of Figure 4.22a) is given by 


n(t) = r(t) cos[2n/ c f+ (//■(?)] 


where 


and 




Aj, N ( ( n V ”q) d,! i d,! Q _ 




/ 


«Q 


Illustrating the coordinate system 
for representation of narrowband 
noise: (a) in terms of in-phase and 
quadrature components; (b) in 


0 


"I 


0 


"I 


terms of envelope and phase. 


(a) 


(b) 
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where d«j and dtiQ are incrementally small. Now, define the transformations (see Figure 4.22b) 

= rcos yr 
Hq = r sin ^ 

In a limiting sense, we may equate the two incremental areas shown shaded in parts a and 
b of Figure 4.22 and thus write 

d«jd«Q = r dr Ay/ 

Now, let R and 'F denote the random variables obtained by observing (at some fixed time 
t ) the stochastic processes represented by the envelope r(t) and phase t//(f) respectively. 
Then substituting (4. 1 32)— (4. 1 34) into (4.131), we find that the probability of the random 
variables R and 'F lying jointly inside the shaded area of Figure 4.22b is equal to the 
expression 

r f 2 \ 

— ^exp ~—A dr Ay/ 

2na v 2cr”F 


That is, the joint probability density function of R and 'F is given by 

r f ^ 

/r « F (t V) = — ~l ex P I 

2na~ v 2cr“F 


This probability density function is independent of the angle y/, which means that the 
random variables R and 'T are statistically independent. We may thus express f R >j,(r, y/) 
as the product of the two probability density functions: f R (r) and f, v ( y/) . In particular, 
the random variable 'F representing the phase is uniformly distributed inside the interval 
[0, 2n], as shown by 


f , s 0 < y/< 2n 

fw(V') = j 2n v 

0, elsewhere 

This result leaves the probability density function of the random variable R as 


/ff( r ) = ) 


-exp 


a 

0 , 


( 2 \ 
r 

V 2 aJ 


r > 0 

elsewhere 


2 

where cr is the variance of the original narrowband noise n(t). A random variable having 
the probability density function of (4.137) is said to be Rayleigh distributed. 

For convenience of graphical presentation, let 


v = 


L 

a 


f v (v) = af R (r ) 
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Normalized Rayleigh distribution. 


Then, we may rewrite the Rayleigh distribution of (4.137) in the normalized form 


Equation (4.140) is plotted in Figure 4.23. The peak value of the distribution fyiv) occurs 
at v = 1 and is equal to 0.607. Note also that, unlike the Gaussian distribution, the Rayleigh 
distribution is zero for negative values of v, which follows naturally from the fact that the 
envelope r(t) of the narrowband noise nit) can only assume nonnegative values. 

Sine Wave Plus Narrowband Noise 


Suppose next that we add the sinusoidal wave Acos(2jt/ c r) to the narrowband noise nit), 
where A and/ c are both constants. We assume that the frequency of the sinusoidal wave is 
the same as the nominal carrier frequency of the noise. A sample function of the sinusoidal 
wave plus noise is then expressed by 


Representing the narrowband noise n(t) in terms of its in-phase and quadrature 
components, we may write 



0 , 


elsewhere 


x(t) = Acos(27t/ c f) + n{t) 


x(t) = «j(f) cos(27t/ c ?) - «q(?) sin(2it/ c f) 


where 


nj(t) = A + nj{t) 


2 

We assume that nit) is Gaussian with zero mean and variance a . Accordingly, we may 


state the following: 

Both /;'[(?) and «q( t) are Gaussian and statistically independent. 
The mean of n\{t) is A and that of «q(/j is zero. 

The variance of both «j(f) and «q( 0 is a ~ . 
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We may, therefore, express the joint probability density function of the random variables 
/Vj and N q , corresponding to n j ( t) and n () { t ) , as follows: 


f N i,N Q ( n 1 ’”q) _ 


2na" 


exp 


K -Ar + »p 
2cr 2 


Let r(t) denote the envelope of x(t ) and y/(t) denote its phase. From (4.142), we thus find 
that 


and 


r(t) = {[n'jit)] +n Q (t) 


1/2 


yr(t) = tan-l 


~ »Q (O ' 

n'l (*)_ 


Following a procedure similar to that described in Section 4.12 for the derivation of the 
Rayleigh distribution, we find that the joint probability density function of the random 
variables R and t//, corresponding to r(t) and t//( t) for some fixed time t, is given by 


//?, (/ (L W) = 


2na~ 


-exp 


r + A" - 2Ar cos y/ 
2 a 


We see that in this case, however, we cannot express the joint probability density function 
f R ( r , y/) as a product / s (r)/^( y/), because we now have a term involving the values of 
both random variables multiplied together as rcos y/ . Hence, R and y/ are dependent 
random variables for nonzero values of the amplitude A of the sinusoidal component. 

We are interested, in particular, in the probability density function of R. To determine 
this probability density function, we integrate (4.147) over all possible values of y/, 
obtaining the desired marginal density 



An integral similar to that in the right-hand side of (4.148) is referred to in the literature as 
the modified Bessel function of the first kind of zero order (see Appendix C); that is, 


1 r 2K 

/q(x) = — exp(x cos (/z) d^z 
ZTl J n 


2 n 

2 

Thus, letting .r = Ar/ a , we may rewrite (4. 148) in the compact form 


f R (r) = —exp 


f 2 .2' 

r +A 

2 a 


A). >->o 


This new distribution is called the Rician distribution. 
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Normalized Rician distribution. 


As with the Rayleigh distribution, the graphical presentation of the Rician distribution 
is simplified by putting 

r 

v — — 

<7 



<7 


f V ( v ) = c/fiM 

Then we may express the Rician distribution of (4.150) in the normalized form 



which is plotted in Figure 4.24 for the values 0, 1, 2, 3, 5, of the parameter a. Based on 
these curves, we may make two observations: 

When the parameter a = 0, and therefore 7 0 ( 0) = 1, the Rician distribution reduces to 
the Rayleigh distribution. 

The envelope distribution is approximately Gaussian in the vicinity of v = a when a 
is large; that is, when the sine-wave amplitude A is large compared with a, the 
square root of the average power of the noise n(t). 

Summary and Discussion 


Much of the material presented in this chapter has dealt with the characterization of a 
particular class of stochastic processes known to be weakly stationary. The implication of 
“weak” stationarity is that we may develop a partial description of a stochastic process in 
terms of two ensemble-averaged parameters: (1) a mean that is independent of time and 
(2) an autocorrelation function that depends only on the difference between the times at 
which two samples of the process are drawn. We also discussed ergodicity, which enables 
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us to use time averages as “estimates” of these parameters. The time averages are 
computed using a sample function (i.e., single waveform realization) of the stochastic 
process, evolving as a function of time. 

The autocorrelation function expressed in terms of the time shift r, is one way 

of describing the second-order statistic of a weakly (wide-sense) stationary process X(t). 
Another equally important parameter, if not more so, for describing the second-order 
statistic of X(t) is the power spectral density S xx (f ), expressed in terms of the frequency/. 
The Fourier transform and the inverse Fourier transform formulas that relate these two 
parameters to each other constitute the celebrated Wiener-Khintchine equations. The first 
of these two equations, namely (4.42), provides the basis for a definition of the power 
spectral density S^if) as the Fourier transform of the autocorrelation function R X x( r )> 
given that Rxx( T ) i s known. This definition was arrived at by working on the output of a 
linear time-invariant filter, driven by a weakly stationary process X(t). We also described 
another definition for the power spectral density described in (4.70); this second 

definition was derived by working directly on the process X(t). 

Another celebrated theorem discussed in the chapter is the Wiener-Khintchine 
theorem, which provides the necessary and sufficient condition for confirming the 
function p X x( T ) as the normalized autocorrelation function of a weakly stationary process 
X(t), provided that it satisfies the Fourier-Stieltjes transform, described in (4.60). 

The stochastic-process theory described in this chapter also included the topic of cross- 
power spectral densities S XY (f) and S YX (f), involving a pair of jointly weakly stationary 
processes X(t ) and Y(t), and how these two frequency-dependent parameters are related to 
the respective cross-correlation functions R xy (t) and Ry X ( r). 

The remaining part of the chapter was devoted to the statistical characterization of 
different kinds of stochastic processes: 

• The Poisson process, which is well-suited for the characterization of random- 
counting processes. 

• The ubiquitous Gaussian process, which is widely used in the statistical study of 
communication systems. 

• The two kinds of electrical noise, namely shot noise and thermal noise. 

• White noise, which plays a fundamental role in the noise analysis of communication 
systems similar to that of the impulse function in the study of linear systems. 

• Narrowband noise, which is produced by passing white noise through a linear band- 
pass filter. Two different methods for the description of narrowband noise were 
presented: one in terms of the in-phase and quadrature components and the other in 
terms of the envelope and phase. 

• The Rayleigh distribution, which is described by the envelope of a narrowband noise 
process. 

• The Rician distribution, which is described by the envelope of narrowband noise 
plus a sinusoidal component, with the midband frequency of the narrowband noise 
and the frequency of the sinusoidal component being coincident. 

We conclude this chapter on stochastic processes by including Table 4.1, where we present 
a graphical summary of the autocorrelation functions and power spectral densities of 
important stochastic processes. All the processes described in this table are assumed to 
have zero mean and unit variance. This table should give the reader a feeling for (1) the 
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interplay between the autocorrelation function and power spectral density of a stochastic 
process and (2) the role of linear filtering in shaping the autocorrelation function or, 
equivalently, the power spectral density of a white-noise process. 


Graphical summary of autocorrelation functions and power 
spectral densities of random processes of zero mean and unit variance 
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Problems 


Stationarity and Ergodicity 

Consider a pair of stochastic processes X(t ) and Y(t). In the strictly stationary world of stochastic 
processes, the statistical independence of X(t) and Y(t ) corresponds to their uncorrelatedness in the 
world of weakly stationary processes. Justify this statement. 

Let X 2 , ...,X k denote a sequence obtained by uniformly sampling a stochastic process X(t). The 
sequence consists of statistically independent and identically distributed (iid) random variables, with 
a common cumulative distribution function F%(x), mean ju, and variance a 2 . Show that this sequence 
is strictly stationary. 

A stochastic process X(t ) is defined by 

X(t) = Acos(2nf c r) 

2 

where A is a Gaussian-distributed random variable of zero mean and variance cr A . The process X(t) is 
applied to an ideal integrator, producing the output 

Y(t) = fx(r)dr 

J o 

Determine the probability density function of the output Y(t) at a particular time t k . 

Determine whether or not Y(t) is strictly stationary. 

Continuing with Problem 4.3, determine whether or not the integrator output Y(t) produced in 
response to the input process X(t) is ergodic. 


Autocorrelation Function and Power Spectral Density 


The square wave x(t) of Figure P4.5, having constant amplitude A , period T 0 , and time shift f d , 
represents the sample function of a stochastic process X(t). The time shift t d is a random variable, 
described by the probability density function 


frP d) 


1 J-T <t < l -T 

7V 2 7 ° _ d ~2 la 


‘0 

0, 


otherwise 


Determine the probability density function of the random variable X(t k ), obtained by sampling 
the stochastic process X(t) at time t k . 

Determine the mean and autocorrelation function of X(t) using ensemble averaging. 

Determine the mean and autocorrelation function of X(t) using time averaging. 

Establish whether or not X(t) is weakly stationary. In what sense is it ergodic? 


x(f) 


A 










0 

-^\'d 

* T 0 



A binary wave consists of a random sequence of symbols 1 and 0, similar to that described in 
Example 6, with one basic difference: symbol 1 is now represented by a pulse of amplitude A volts. 
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and symbol 0 is represented by zero volts. All other parameters are the same as before. Show that 
this new random binary wave X(t) is characterized as follows: 

The autocorrelation function is 


R xx ( t) 


The power spectral density is 


— + ^r 

4 4 


A~ 
4 ’ 


lz-1 <T 


\t\>T 


2 2 

s xx (f) = ~r 8(f) + ^sinc 2 (/T) 


What is the percentage power contained in the dc component of the binary wave? 

The output of an oscillator is described by 

X(t) = A cos(nFt + 0) 

where the amplitude A is constant, and F and 0 are independent random variables. The probability 
density function of 0 is defined by 


/e(6) = 


2k' 

0, 


0<Q<2n 

otherwise 


Find the power spectral density of X(t) in terms of the probability density function of the frequency F. 
What happens to this power spectral density when the randomized frequency F assumes a constant 
value? 


Equation (4.70) presents the second of two definitions introduced in the chapter for the power 
spectral density function, pertaining to a weakly stationary process X(t). This definition 

reconfirms Property 3 of Sxx(f )• as shown in (4.71). 

Using (4.70), prove the other properties of S^xC/): zero correlation among frequency 
components, zero-frequency value, nonnegativity, symmetry, and normalization, which were 
discussed in Section 4.8. 

Starting with (4.70), derive (4.43) that defines the autocorrelation function Rxx ( T ) of the 
stationary process X(t) in terms of Sxxif)- 

In the definition of (4.70) for the power spectral density of a weakly stationary process X(t), it is not 
permissible to interchange the order of expectation and limiting operations. Justify the validity of 
this statement. 


The Wiener-Khintchine Theorem 

In the next four problems we explore the application of the Wiener-Khintchine theorem of (4.60) to 
see whether a given function p(r), expressed in terms of the time shift r, is a legitimate normalized 
autocorrelation function or not. 

Consider the Fourier transformable function 

/O) = — sin(2jt/ c r) for all r 

By inspection, we see that/(r) is an odd function of r. It cannot, therefore, be a legitimate 
autocorrelation function as it violates a fundamental property of the autocorrelation function. Apply 
the Wiener-Khintchine theorem to arrive at this same conclusion. 


200 


Stochastic Processes 


Consider the infinite series 

/(/) = -^(27t/ c r)' + ^( 2 7t/ c r) 4 - ...J for all r 

which is an even function of t, thereby satisfying the symmetry property of the autocorrelation 
function. Apply the Wiener-Khintchine theorem to confirm that /( r) is indeed a legitimate 
autocorrelation function of a weakly stationary process. 

Consider the Gaussian function 

/( r) = exp (-7t r - ) for all r 

which is Fourier transformable. Moreover, it is an even function of r, thereby satisfying the 
symmetry property of the autocorrelation function around the origin r = 0. Apply the Wiener- 
Khintchine theorem to confirm that/(r) is indeed a legitimate normalized autocorrelation function 
of a weakly stationary process. 

Consider the Fourier transformable function 



0, otherwise 

which is an odd function of r. It cannot, therefore, be a legitimate autocorrelation function. Apply 
the Wiener-Khintchine theorem to arrive at this same conclusion. 

Cross-correlation Functions and Cross-spectral Densities 

Consider a pair of weakly stationary processes X(t) and 7(f). Show that the cross-correlations 
Rxy( z ) an d Ryx( T ) °f these two processes have the following properties: 

Rxy( t ) =^yx(~ r ) 

I r xy (r)\<\[R xx (Q) + R YY m 

where Rxx( r) and Ryy{ r) are the autocorrelation functions of X{t) and 7(f) respectively. 

A weakly stationary process X(t), with zero mean and autocorrelation function Rxx ( r ), is passed 
through a differentiator, yielding the new process 

Y(t) = jX(t) 

Determine the autocorrelation function of Y(t). 

Determine the cross-correlation function between X(t) and 7(f). 

Consider two linear filters connected in cascade as in Figure P4.16. Let X(t) be a weakly stationary 
process with autocorrelation function R^ir). The weakly stationary process appearing at the first 
filter output is denoted by V(t) and that at the second filter output is denoted by Y(t). 

Find the autocorrelation function of 7(f). 

Find the cross-correlation function Ryyir) of V(t) and 7(f). 



x(t) 


y(i ) 
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A weakly stationary process X(t ) is applied to a linear time-invariant filter of impulse response h(t), 
producing the output Y(t). 

Show that the cross-correlation function Ryx(t) of the output Y{t) and the input X(t) is equal to 
the impulse response h{r) convolved with the autocorrelation function R xx (t) of the input, as 
shown by 

R yx (t) = h(u)R xx ( r- u) du 

-00 

Show that the second cross-correlation function R XY ( T) is 

f® 

R xy {Y) = h(-u)R xx (r- u) du 

-00 

Find the cross-spectral densities Syxif) and S XY (f). 

Assuming that X{t) is a white-noise process with zero mean and power spectral density Aq/ 2, 
show that 

N 0 

Ryx (r) = -jh(r) 

Comment on the practical significance of this result. 


Poisson Process 

The sample function of a stochastic process X(t ) is shown in Figure P4.18a, where we see that the 
sample function x(t) assumes the values ±1 in a random manner. It is assumed that at time t = 0, the 
values X(0) = —1 and X(l) = +1 are equiprobable. From there on, the changes in X(t) occur in 
accordance with a Poisson process of average rate A. The process X(t), described herein, is 
sometimes referred to as a telegraph signal. 

Show that, for any time t> 0, the values X(t) = —1 and X(t) = +1 are equiprobable. 

Building on the result of part a, show that the mean of X(t) is zero and its variance is unity. 

Show that the autocorrelation function of X( t) is given by 

R xx ( T ) = exp(-2Tr) 

The process X(t) is applied to the simple low-pass filter of Figure P4.18b. Determine the power 
spectral density of the process Y{t ) produced at the filter output. 



Low-pass 

filter 

H{f) 


(b) 


Poisson process 
XU) 


Output process 
- YU) 
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Gaussian Process 


Consider the pair of integrals 


and 


i 7 ! 



h^Xit) dr 


h 2 (t)X(t) dr 


where X{t) is a Gaussian process and /t^f) and h 2 (t) are two different weighting functions. Show 
that the two random variables Fj and Y 2 , resulting from the integrations, are jointly Gaussian. 

2 

A Gaussian process X(t), with zero mean and variance a x , is passed through a full-wave rectifier, 
which is described by the input-output relationship of Figure P4.20. Show that the probability 
density function of the random variable F(fy), obtained by observing the stochastic process F(f) 
produced at the rectifier output at time t k , is one sided, as shown by 


f Y ( t t p) 


12 

1 

' v 2 l 

~ 

— ex p 


V 71 

°x l 

2 cjJ 


0 , 


y >0 
y < 0 


Confirm that the total area under the graph of f Y ( t j(y) is unity. 


Y 



A stationary Gaussian process X(t), with mean /u x and variance a x , is passed through two linear 
filters with impulse responses h^{t) and /t 2 (f), yielding the processes Y(t) and Z(t), as shown in 
Figure P4.21. Determine the necessary and sufficient conditions, for which Y(t{) and Z(f 2 ) are 
statistically independent Gaussian processes. 



I'M 


ZM 


White Noise 

Consider the stochastic process 

X(t) = W(t) +aW(t-t Q ) 

where W(t) is a white-noise process of power spectral density N 0 /2 and the parameters a and t 0 are 
constants. 
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Determine the autocorrelation function of the process X(t), and sketch it. 
Determine the power spectral density of the process X(t), and sketch it. 
The process 

X(t) = Acos(27i/ 0 r + ©)+ W(t) 


describes a sinusoidal process that is corrupted by an additive white-noise process W(t) of known 
power spectral density N 0 /2. The phase of the sinusoidal process, denoted by 0 , is a uniformly 
distributed random variable, defined by 


/e(0) = 


2 n 
0 


for -n < 0 < 7t 
otherwise 


The amplitude A and frequency f 0 are both constant but unknown. 

Determine the autocorrelation function of the process X(t) and its power spectral density. 

How would you use the two results of part a to measure the unknown parameters A and / 0 ? 

A white Gaussian noise process of zero mean and power spectral density Nq/2 is applied to the 
filtering scheme shown in Figure P4.24. The noise at the low-pass filter output is denoted by n(f). 
Find the power spectral density and the autocorrelation function of n{t). 

Find the mean and variance of n(t). 

What is the maximum rate at which n(t) can be sampled so that the resulting samples are 
essentially uncorrelated? 


White 

noise 



Band-pass 


Low-pass 

— ► 

filter 


filter 


Hit/) 

H 2 {f ) 


Output 


cos (2jt f c t) 


(a) 


Wf)\ 




1.0 


1.0 


(b) 


Let X(t) be a weakly stationary process with zero mean, autocorrelation function Rxx( r )> and power 
spectral density Sxx(f )■ We are required to find a linear filter with impulse response h(t), such that 
the filter output is X(t) when the input is white-noise of power spectral density N 0 /2. 

Determine the condition that the impulse response /iff) must satisfy in order to achieve this 
requirement. 

What is the corresponding condition on the transfer function H(f) of the filter? 

Using the Paley-Wiener criterion discussed in Chapter 2, find the requirement on Sxxif) f° r the 
filter to be causal. 
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Narrowband Noise 

Consider a narrowband noise n(t) with its Hilbert transform denoted by n(t) . 

Show that the cross-correlation functions of n(t ) and n(t) are given by 

R NN (t) = - R NNi^) 

and 

= Rnn{t) 

where R nn (t) is the Hilbert transform of the autocorrelation function Rnn( t ) °f «(? )• 
Hint: use the formula 



Show that, for r = 0, we have R -(0) = R;,,, = 0 ■ 

NN ’ NN 

A narrowband noise n(t) has zero mean and autocorrelation function R nn (t). Its power spectral 
density S NN (f) is centered about ±f c . The in-phase and quadrature components, Wj(f) and / 1 q((), of 
n(t) are defined by the weighted sums 

7tj(0 = n (t) cos (2 nf c t) + n(t) sin(27 if c t) 

and 

”q(0 = n(t) cos (2nf c t) -n(t) sin(2nf c t ) 

where n(t) is the Hilbert transform of the noise n{l). Using the result obtained in part a of Problem 
4.26, show that n t (l) and nn(f) have the following autocorrelation functions: 

%!#,(*) = R n q n q ( t ) = ^nn( z")cos(2ji/ c f) + r nn( t ) s in(27t/ c r) 

and 

r w,n q ( - r ) = ~ R n q 4 t ) = R NN^ ^ ^tn(27iy c t) — Rjy]y( t) cos (271^ r) 


Rayleigh and Rician Distributions 

Consider the problem of propagating signals through so-called random or fading communications 
channels. Examples of such channels include the ionosphere from which short-wave (high- 
frequency) signals are reflected back to the earth producing long-range radio transmission, and 
underwater communications. A simple model of such a channel is shown in Figure P4.28, which 
consists of a large collection of random scatterers , with the result that a single incident beam is 
converted into a correspondingly large number of scattered beams at the receiver. The transmitted 
signal is equal to Aexp(J2nf c t). Assume that all scattered beams travel at the same mean velocity. 
However, each scattered beam differs in amplitude and phase from the incident beam, so that the kth 
scattered beam is given by A k exp(j2nf c t + j& k ) , where the amplitude A k and the phase 0^. vary 
slowly and randomly with time. In particular, assume that the are all independent of one another 
and uniformly distributed random variables. 

With the received signal denoted by 

x(t) = r(t) exp\j2nf c t+ y/(t)} 

show that the random variable R, obtained by observing the envelope of the received signal at 
time t. is Rayleigh-distributed, and that the random variable V P, obtained by observing the phase 
at some fixed time, is uniformly distributed. 

Assuming that the channel includes a line-of-sight path, so that the received signal contains a 
sinusoidal component of frequency f c , show that in this case the envelope of the received signal is 
Rician distributed. 
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Referring back to the graphical plots of Figure 4.23, describing the Rician envelope distribution for 
varying parameter a, we see that for the parameter a = 5, this distribution is approximately Gaussian. 
Justify the validity of this statement. 

Notes 


1 . Stochastic is of Greek origin. 

2. For rigorous treatment of stochastic processes, see the classic books by Doob (1953), Loeve 
(1963), and Cramer and Leadbetter (1967). 

3. Traditionally, (4.42) and (4.43) have been referred to in the literature as the Wiener-Khintchine 
relations in recognition of pioneering work done by Norbert Wiener and A. I. Khintchine; for their 
original papers, see Wiener (1930) and Khintchine (1934). The discovery of a forgotten paper by 
Albert Einstein on time-series analysis (delivered at the Swiss Physical Society’s February 1914 
meeting in Basel) reveals that Einstein had discussed the autocorrelation function and its relationship 
to the spectral content of a time series many years before Wiener and Khintchine. An English 
translation of Einstein's paper is reproduced in the IEEE ASSP Magazine, vol. 4, October 1987. This 
particular issue also contains articles by W.A. Gardner and A.M. Yaglom, which elaborate on 
Einstein’s original work. 

4. For a mathematical proof of the Wiener-Khintchine theorem, see Priestley (1981). 

5. Equation (4.70) provides the mathematical basis for estimating the power spectral density of a 
weakly stationary process. There is a plethora of procedures that have been formulated for 
performing this estimation. For a detailed treatment of reliable procedures to do the estimation, see 
the book by Percival and Walden (1993). 

6. The Poisson process is named in honor of S.D. Poisson. The distribution bearing his name first 
appeared in an exposition by Poisson on the role of probability in the administration of justice. The 
classic book on Poisson processes is Snyder (1975). For an introductory treatment of the subject, see 
Bertsekas and Tsitsiklis (2008: Chapter 6). 

7. The Gaussian distribution and the associated Gaussian process are named after the great 
mathematician C.F. Gauss. At age 18, Gauss invented the method of least squares for finding the 
best value of a sequence of measurements of some quantity. Gauss later used the method of least 
squares in fitting orbits of planets to data measurements, a procedure that was published in 1809 in 
his book entitled Theory of Motion of the Heavenly Bodies. In connection with the error of 
observation, he developed the Gaussian distribution. 
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8. Thermal noise was first studied experimentally by J.B. Johnson in 1928, and for this reason it is 
sometimes referred to as the Johnson noise. Johnson’s experiments were confirmed theoretically by 
Nyquist (1928a). 

9. For further insight into white noise, see Appendix I on generalized random processes in the book 
by Yaglom (1962). 

10. The Rayleigh distribution is named in honor of the English physicist J.W. Strutt, Lord Rayleigh. 

11. The Rician distribution is named in honor of S.O. Rice (1945). 

12. In mobile wireless communications to be covered in Chapter 9, the sinusoidal term 
Acos(2nf c t ) in (4.141) is viewed as a line-of-sight (LOS) component of average power A 2 /2 and the 
additive noise term n(t) is viewed as a Gaussian diffuse component of average power er , with both 
being assumed to have zero mean. In such an environment, it is the Rice factor K that is used to 
characterize the Rician distribution. Formally, we write 


^ _ Average power of the LOS component 
Average power of the diffuse component 


a 2 2CT " 

In effect, K = — . Thus for the graphical plots of Figure 4.23, the running parameter K would 
assume the values 0, 0.5, 2, 4.5, 12.5. 


Information Theory 


Introduction 


As mentioned in Chapter 1 and reiterated along the way, the purpose of a communication 
system is to facilitate the transmission of signals generated by a source of information over a 
communication channel. But, in basic terms, what do we mean by the term information? To 
address this important issue, we need to understand the fundamentals of information theory. 

The rationale for studying the fundamentals of information theory at this early stage in 
the book is threefold: 

Information theory makes extensive use of probability theory, which we studied in 
Chapter 3; it is, therefore, a logical follow-up to that chapter. 

It adds meaning to the term “information” used in previous chapters of the book. 
Most importantly, information theory paves the way for many important concepts 
and topics discussed in subsequent chapters. 

In the context of communications, information theory deals with mathematical modeling 
and analysis of a communication system rather than with physical sources and physical 
channels. In particular, it provides answers to two fundamental questions (among others): 

What is the irreducible complexity, below which a signal cannot be compressed? 
What is the ultimate transmission rate for reliable communication over a noisy channel? 
The answers to these two questions lie in the entropy of a source and the capacity of a 
channel, respectively: 

Entropy is defined in terms of the probabilistic behavior of a source of information; 
it is so named in deference to the parallel use of this concept in thermodynamics. 
Capacity is defined as the intrinsic ability of a channel to convey information; it is 
naturally related to the noise characteristics of the channel. 

A remarkable result that emerges from information theory is that if the entropy of the 
source is less than the capacity of the channel, then, ideally, error-free communication over 
the channel can be achieved. It is, therefore, fitting that we begin our study of information 
theory by discussing the relationships among uncertainty, information, and entropy. 

Entropy 


Suppose that a probabilistic experiment involves observation of the output emitted by a 
discrete source during every signaling interval. The source output is modeled as a 
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stochastic process, a sample of which is denoted by the discrete random variable S. This 
random variable takes on symbols from the fixed finite alphabet 


with probabilities 


•f - {^0’ 5 i> • • • ’ S K- 1 ) 

P (S=s k )=p k , k = 0, 1, K- 1 


Of course, this set of probabilities must satisfy the normalization property 

K- 1 

X P k = l ’ Pk^° 

k = 0 

We assume that the symbols emitted by the source during successive signaling intervals 
are statistically independent. Given such a scenario, can we find a measure of how much 
information is produced by such a source? To answer this question, we recognize that the 
idea of information is closely related to that of uncertainty or surprise, as described next. 

Consider the event S = s k , describing the emission of symbol s k by the source with 
probability p k , as defined in (5.2). Clearly, if the probability p k = I and p t = 0 for all i ^ k , 
then there is no “surprise” and, therefore, no “information” when symbol s k is emitted, 
because we know what the message from the source must be. If, on the other hand, the 
source symbols occur with different probabilities and the probability p k is low, then there 
is more surprise and, therefore, information when symbol s k is emitted by the source than 
when another symbol ,y ( -, i * k , with higher probability is emitted. Thus, the words uncer- 
tainty, surprise, and information are all related. Before the event S = s k occurs, there is an 
amount of uncertainty. When the event S = s k occurs, there is an amount of surprise. After 
the occurrence of the event S = s k , there is gain in the amount of information, the essence 
of which may be viewed as the resolution of uncertainty. Most importantly, the amount of 
information is related to the inverse of the probability of occurrence of the event S = s k . 

We define the amount of information gained after observing the event S = s k , which 
occurs with probability p k , as the logarithmic function 

/(,,) = log(i) 


which is often termed “self-information” of the event S = Sf.. This definition exhibits the 
following important properties that are intuitively satisfying: 


I(s k ) = 0 for p k = 1 

Obviously, if we are absolutely certain of the outcome of an event, even before it occurs, 
there is no information gained. 


I(s k ) >0 for 0 < p k < 1 

That is to say, the occurrence of an event S = s k either provides some or no information, 
but never brings about a loss of information. 


*0*) >/(■*;) for P k <Pi 

That is, the less probable an event is, the more information we gain when it occurs. 
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I(s k ,Si) = I(s k ) + Iisj) if s k and s : are statistically independent 

This additive property follows from the logarithmic definition described in (5.4). 

The base of the logarithm in (5.4) specifies the units of information measure. 
Nevertheless, it is standard practice in information theory to use a logarithm to base 2 with 
binary signaling in mind. The resulting unit of information is called the bit, which is a 
contraction of the words binary digit. We thus write 

I( S k ) = log 2 (^) 

= -log 2 p k for k = 0, 1, K- 1 
When p k = 1/2, we have l(s k ) = 1 bit. We may, therefore, state: 


Note that the information l(s k ) is positive, because the logarithm of a number less than 
one, such as a probability, is negative. Note also that if p ^ is zero, then the self-information 
I assumes an unbounded value. 

k 

The amount of information I(s k ) produced by the source during an arbitrary signaling 
interval depends on the symbol s k emitted by the source at the time. The self-information 
/(s^) is a discrete random variable that takes on the values I(sq), I(si), ..., I(s K _{) with 
probabilities p 0 , p\, ..., p K _ \ respectively. The expectation of I{s k ) over all the probable 
values taken by the random variable S is given by 

H(S) = E [I(s k )] 

K- 1 

= X p k l(s k) 

k = 0 

= ^2(7) 

k = 0 k 

The quantity IKS) is called the entropy, formally defined as follows: 


Note that the entropy H(S) is independent of the alphabet T; it depends only on the 
probabilities of the symbols in the alphabet £fof the source. 


Building on the definition of entropy given in (5.9), we find that entropy of the discrete 
random variable S is bounded as follows: 

0 < H(S) < log 2 K 
where K is the number of symbols in the alphabet if. 
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Elaborating on the two bounds on entropy in (5.10), we now make two statements: 

IKS) = 0, if, and only if, the probability p k = 1 for some k, and the remaining 
probabilities in the set are all zero; this lower bound on entropy corresponds to no 
uncertainty. 

H(S ) = log K, if, and only if, p k = UK for all k (i.e., all the symbols in the source 
alphabet if are equiprobable); this upper bound on entropy corresponds to maximum 
uncertainty. 

To prove these properties of H(S), we proceed as follows. First, since each probability p k is 
less than or equal to unity, it follows that each term pfAogji^lPk) i n (5-9) is always 
nonnegative, so H(S) > 0. Next, we note that the product term p k log 2 ( l//?/t) is zero if, and 
only if, p k = 0 or 1 . We therefore deduce that IKS) = 0 if, and only if, p k = 0 or 1 for some 
k and all the rest are zero. This completes the proofs of the lower bound in (5.10) and 
statement 1. 

To prove the upper bound in (5.10) and statement 2, we make use of a property of the 
natural logarithm: 


where log e is another way of describing the natural logarithm, commonly denoted by In; 
both notations are used interchangeably. This inequality can be readily verified by plotting 
the functions lnx and (x - 1) versus x, as shown in Figure 5.1. Here we see that the line 
y = x - 1 always lies above the curve y = log L x. The equality holds only at the point x = 1, 
where the line is tangential to the curve. 

To proceed with the proof, consider first any two different probability distributions 
denoted by p 0 , p\, ...,p K _ j and q$, q i, ..., q %_ i on the alphabet if = {.sp, it, ..., s K _ \ ) of a 
discrete source. We may then define the relative entropy of these two distributions: 


log £ x<x-l, x>0 



l.o u 



0 


x 


1.0 


Graphs of the functions x — 1 and log x versus x. 
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Hence, changing to the natural logarithm and using the inequality of (5.11), we may 
express the summation on the right-hand side of (5.12) as follows: 

K - 1 K - 1 


X /¥°g : 

k = 0 


- X P k log, 

k = 0 


> 


In 2 




= 0 

where, in the third line of the equation, it is noted that the sums over p k and q k are both 
equal to unity in accordance with (5.3). We thus have the fundamental property of 
probability theory: 

D(p\\q)>0 

In words, (5.13) states: 


Suppose we next put 



which corresponds to a source alphabet if with equiprobable symbols. Using this 
distribution in (5.12) yields 

K - 1 K - 1 

D(p\\q) = Z p k l° g 2 p k + l° g 2 K X Pk 

k = 0 k = 0 

= -H(S) + log 2 K 

where we have made use of (5.3) and (5.9). Hence, invoking the fundamental inequality of 
(5.13), we may finally write 

H(S) < log 2 /f 

Thus, H(S) is always less than or equal to log 2 K. The equality holds if, and only if, the 
symbols in the alphabet if are equiprobable. This completes the proof of (5.10) and with it 
the accompanying statements 1 and 2. 

Entropy of Bernoulli Random Variable 

To illustrate the properties of H(S) summed up in (5.10), consider the Bernoulli random 
variable for which symbol 0 occurs with probability p 0 and symbol 1 with probability 
Pi = 1 ~Po- 
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The entropy of this random variable is 

H(S) = -p 0 \og 2 p () - P] log 2 p l 

= -P 0 l °g 2 Po-( 1 -P(y) lo g2( 1_ ^o) bits 

from which we observe the following: 

When /?o = 0, the entropy IKS) = 0; this follows from the fact that x log e x — > 0 as 
x — > 0. 

When pq = 1 , the entropy IKS) = 0. 

The entropy H(S) attains its maximum value // max = 1 bit when p j = p 0 = 1/2; that is, 
when symbols 1 and 0 are equally probable. 

In other words, IKS) is symmetric about p {] = 1/2. 

The function of p 0 given on the right-hand side of (5.15) is frequently encountered in 
information-theoretic problems. It is customary, therefore, to assign a special symbol to 
this function. Specifically, we define 

H(p 0 ) = -Po 1 °g 2 f’o-( 1 -Po) 1 °g 2 ( 1 -/ , o) 

We refer to H{p 0 ) as the entropy function. The distinction between (5.15) and (5.16) 
should be carefully noted. The H(S ) of (5.15) gives the entropy of the Bernoulli random 
variable S. The H(p q) of (5.16), on the other hand, is a function of the prior probability p 0 
defined on the interval [0, 1]. Accordingly, we may plot the entropy function //(/jq) versus 
Pq, defined on the interval [0, 1], as shown in Figure 5.2. The curve in Figure 5.2 
highlights the observations made under points 1, 2, and 3. 


Entropy function 



To add specificity to the discrete source of symbols that has been the focus of attention up 
until now, we now assume it to be memoryless in the sense that the symbol emitted by the 
source at any time is independent of previous and future emissions. 

In this context, we often find it useful to consider blocks rather than individual symbols, 
with each block consisting of n successive source symbols. We may view each such block 
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as being produced by an extended source with a source alphabet described by the Cartesian 
product of a set S" that has K" distinct blocks, where K is the number of distinct symbols in 
the source alphabet S of the original source. With the source symbols being statistically 
independent, it follows that the probability of a source symbol in S n is equal to the product 
of the probabilities of the n source symbols in S that constitute a particular source symbol of 
S' 1 . We may thus intuitively expect that //(.S'"), the entropy of the extended source, is equal 
to n times IKS), the entropy of the original source. That is, we may write 

H(S (n) ) = nH(S ) 

We illustrate the validity of this relationship by way of an example. 

Entropy of Extended Source 

Consider a discrete memory less source with source alphabet if = {.v () , .V], ,v 2 }, whose three 
distinct symbols have the following probabilities: 

1 

Po= 4 

1 

P ' = 4 

1 

p 2=~ 2 

Hence, the use of (5.9) yields the entropy of the discrete random variable S representing 
the source as 

H(S) = Po log +Pl log 2 (l) +P2 log 2 (l) 

= \ log 2 (4) + ~ A log 2 (4) + ~ log 2 (2) 

= - bits 
2 

Consider next the second-order extension of the source. With the source alphabet S f 
consisting of three symbols, it follows that the source alphabet of the extended source .S' 1 2 1 
has nine symbols. The first row of Table 5.1 presents the nine symbols of S®\ denoted by 
£7 0 , £7] . ..., £jg. The second row of the table presents the composition of these nine symbols 
in terms of the corresponding sequences of source symbols ,v () , jj, and .v 2 , taken two at a 


Alphabets of second-order extension of a discrete memoryless source 


Symbols of S^ 2) 
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Corresponding sequences of 
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¥l 
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S 2 S 0 
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S 2 S 2 

Probability P(oj), i = 0, 1, ..., 8 
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1 

1 

1 

1 

1 

1 
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16 

8 

16 

16 

8 

8 
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time. The probabilities of the nine source symbols of the extended source are presented in 
the last row of the table. Accordingly, the use of (5.9) yields the entropy of the extended 
source as 


We thus see that H(S >2> ) = 2 H(S) in accordance with (5.17). 


Source-coding Theorem 


Now that we understand the meaning of entropy of a random variable, we are equipped to 
address an important issue in communication theory: the representation of data generated 
by a discrete source of information. 

The process by which this representation is accomplished is called source encoding. 
The device that performs the representation is called a source encoder. For reasons to be 
described, it may be desirable to know the statistics of the source. In particular, if some 
source symbols are known to be more probable than others, then we may exploit this 
feature in the generation of a source code by assigning short codewords to frequent source 
symbols, and long codewords to rare source symbols. We refer to such a source code as a 
variable-length code. The Morse code, used in telegraphy in the past, is an example of a 
variable-length code. Our primary interest is in the formulation of a source encoder that 
satisfies two requirements: 

The codewords produced by the encoder are in binary form. 

The source code is uniquely decodable , so that the original source sequence can be 
reconstructed perfectly from the encoded binary sequence. 

The second requirement is particularly important: it constitutes the basis for a perfect 
source code. 

Consider then the scheme shown in Figure 5.3 that depicts a discrete memoryless 
source whose output s k is converted by the source encoder into a sequence of Os and Is, 
denoted by b k . We assume that the source has an alphabet with K different symbols and 
that the £th symbol s k occurs with probability p k , k = 0, 1, . K-\. Let the binary 



= ^l«g 2 ( 16 ) + ^ l0 §2( 16 ) + ^ lo g2( 8 ) + ^ l0 §2( 16 ) 

+ ^ lo §2( 16 ) + ^ lo §2( 8 ) + § lo g2( 8 ) + ^ lo g2( 8 ) + ^ lo g2( 4 ) 
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codeword assigned to symbol s k by the encoder have length l k . measured in bits. We define 
the average codeword length L of the source encoder as 


K- 1 

L = X Pk ^ k 

k= 0 

In physical terms, the parameter L represents the average number of bits per source 
symbol used in the source encoding process. Let /. mm denote the minimum possible value 
of L. We then define the coding efficiency of the source encoder as 



With L > L min , we clearly have 77 < 1 . The source encoder is said to be efficient when 77 
approaches unity. 

But how is the minimum value L min determined? The answer to this fundamental 
question is embodied in Shannon’s first theorem: the source-coding theorem, which may 
be stated as follows: 


According to this theorem, the entropy H(S ) represents a. fundamental limit on the average 
number of bits per source symbol necessary to represent a discrete memoryless source, in 
that it can be made as small as but no smaller than the entropy H(S). Thus, setting 
L m j n = IKS), we may rewrite (5.19), defining the efficiency of a source encoder in terms of 
the entropy H(S ) as shown by 

„ _ ms) 


where as before we have 77 < 1 . 

Lossless Data Compression Algorithms 


A common characteristic of signals generated by physical sources is that, in their natural 
form, they contain a significant amount of redundant information, the transmission of 
which is therefore wasteful of primary communication resources. For example, the output 
of a computer used for business transactions constitutes a redundant sequence in the sense 
that any two adjacent symbols are typically correlated with each other. 

For efficient signal transmission, the redundant information should, therefore, be 
removed from the signal prior to transmission. This operation, with no loss of information, 
is ordinarily performed on a signal in digital form, in which case we refer to the operation 
as lossless data compression. The code resulting from such an operation provides a 
representation of the source output that is not only efficient in terms of the average number 
of bits per symbol, but also exact in the sense that the original data can be reconstructed 
with no loss of information. The entropy of the source establishes the fundamental limit on 
the removal of redundancy from the data. Basically, lossless data compression is achieved 
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by assigning short descriptions to the most frequent outcomes of the source output and 
longer descriptions to the less frequent ones. 

In this section we discuss some source-coding schemes for lossless data compression. 
We begin the discussion by describing a type of source code known as a prefix code, 
which not only is uniquely decodable, but also offers the possibility of realizing an 
average codeword length that can be made arbitrarily close to the source entropy. 


Consider a discrete memoryless source of alphabet {sq, .v ] , ..., s K _ \ } and respective 
probabilities {p 0 , p i , . . . , p K j } . For a source code representing the output of this source to 
be of practical use, the code has to be uniquely decodable. This restriction ensures that, for 
each finite sequence of symbols emitted by the source, the corresponding sequence of 
codewords is different from the sequence of codewords corresponding to any other source 
sequence. We are specifically interested in a special class of codes satisfying a restriction 
known as the prefix condition. To define the prefix condition, let the codeword assigned to 
source symbol s k be denoted by (m, , m, , ..., m. ), where the individual elements 
m k , ..., nif, are Os and Is and n is the codeword length. The initial part of the codeword 
is represented by the elements m k , m k for some i < n. Any sequence made up of the 
initial part of the codeword is called a prefix of the codeword. We thus say: 


Prefix codes are distinguished from other uniquely decodable codes by the fact that the 
end of a codeword is always recognizable. Hence, the decoding of a prefix can be 
accomplished as soon as the binary sequence representing a source symbol is fully 
received. For this reason, prefix codes are also referred to as instantaneous codes. 

Illustrative Example of Prefix Coding 

To illustrate the meaning of a prefix code, consider the three source codes described in 
Table 5.2. Code I is not a prefix code because the bit 0, the codeword for sq, is a prefix of 
00, the codeword for .vi. Likewise, the bit 1, the codeword for sq, is a prefix of 11, the 
codeword for s^. Similarly, we may show that code III is not a prefix code but code II is. 

Illustrating the definition of a prefix code 
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To decode a sequence of codewords generated from a prefix source code, the source 
decoder simply starts at the beginning of the sequence and decodes one codeword at a 
time. Specifically, it sets up what is equivalent to a decision tree, which is a graphical 
portrayal of the codewords in the particular source code. For example. Figure 5.4 depicts 
the decision tree corresponding to code II in Table 5.2. The tree has an initial state and 
four terminal states corresponding to source symbols sq, ,V| , .v 2 , and .V 3 . The decoder always 
starts at the initial state. The first received bit moves the decoder to the terminal state s 0 if 
it is 0 or else to a second decision point if it is 1 . In the latter case, the second bit moves the 
decoder one step further down the tree, either to terminal state .sj if it is 0 or else to a third 
decision point if it is 1, and so on. Once each terminal state emits its symbol, the decoder 
is reset to its initial state. Note also that each bit in the received encoded sequence is 
examined only once. Consider, for example, the following encoded sequence: 


This sequence is readily decoded as the source sequence The reader is 

invited to carry out this decoding. 

As mentioned previously, a prefix code has the important property that it is 
instantaneously decodable. But the converse is not necessarily true. For example, code III 
in Table 5.2 does not satisfy the prefix condition, yet it is uniquely decodable because the 
bit 0 indicates the beginning of each codeword in the code. 

To probe more deeply into prefix codes, exemplified by that in Table 5.2, we resort to 
an inequality, which is considered next. 


Consider a discrete memory less source with source alphabet {sq, tq, ..., s K _ \ } and source 
probabilities {po,p\, Pk_] }, with the codeword of symbol having length !/., k = 0 , 1 , 
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Decision tree for code II of Table 5.2. 
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. . K— 1 . Then, according to the Kraft inequality, the codeword lengths always satisfy the 
following inequality: 

K- 1 _/ 

^2 ‘<1 

k = 0 

where the factor 2 refers to the number of symbols in the binary alphabet. The Kraft 
inequality is a necessary but not sufficient condition for a source code to be a prefix code. 
In other words, the inequality of (5.22) is merely a condition on the codeword lengths of a 
prefix code and not on the codewords themselves. For example, referring to the three 
codes listed in Table 5.2, we see: 

• Code I violates the Kraft inequality; it cannot, therefore, be a prefix code. 

• The Kraft inequality is satisfied by both codes II and III, but only code II is a 
prefix code. 

Given a discrete memoryless source of entropy H(S), a prefix code can be constructed with 
an average codeword length L , which is bounded as follows: 

H(S) <L< H(S) + 1 

The left-hand bound of (5.23) is satisfied with equality under the condition that symbol s k 
is emitted by the source with probability 



where f is the length of the codeword assigned to source symbol s k . A distribution governed 
by (5.24) is said to be a dyadic distribution. For this distribution, we naturally have 

ic-l K- 1 

X 2 = 2 >= 1 

k =0 k = 0 


Under this condition, the Kraft inequality of (5.22) confirms that we can construct a prefix 
code, such that the length of the codeword assigned to source symbol s k is -log 2 p k . For 
such a code, the average codeword length is 


L 


K- 1 / 

Si 

k = o 2 


and the corresponding entropy of the 

H(S) = 


source is 



Hence, in this special (rather meretricious) case, we find from (5.25) and (5.26) that the 
prefix code is matched to the source in that L = H(S). 

But how do we match the prefix code to an arbitrary discrete memoryless source? The 
answer to this basic problem lies in the use of an extended code. Let L n denote the 
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average codeword length of the extended prefix code. For a uniquely decodable code, L n 
is the smallest possible. From (5.23), we find that 

nH(S ) < L n < nH(S ) + 1 

or, equivalently, 

H(S)< — <H(S) + - 
n n 

In the limit, as n approaches infinity, the lower and upper bounds in (5.28) converge as 
shown by 

lim -L n = H(S ) 

n — > oo w 

We may, therefore, make the statement: 


In other words, the average codeword length of an extended prefix code can be made as 
small as the entropy of the source, provided that the extended code has a high enough 
order in accordance with the source-coding theorem. However, the price we have to pay 
for decreasing the average codeword length is increased decoding complexity, which is 
brought about by the high order of the extended prefix code. 


We next describe an important class of prefix codes known as Huffman codes. The basic 
idea behind Huffman coding is the construction of a simple algorithm that computes an 
optimal prefix code for a given distribution, optimal in the sense that the code has the 
shortest expected length. The end result is a source code whose average codeword length 
approaches the fundamental limit set by the entropy of a discrete memoryless source, 
namely H(S). The essence of the algorithm used to synthesize the Huffman code is to 
replace the prescribed set of source statistics of a discrete memoryless source with a 
simpler one. This reduction process is continued in a step-by-step manner until we are left 
with a final set of only two source statistics (symbols), for which (0, 1) is an optimal code. 
Starting from this trivial code, we then work backward and thereby construct the Huffman 
code for the given source. 

To be specific, the Huffman encoding algorithm proceeds as follows: 

The source symbols are listed in order of decreasing probability. The two source 
symbols of lowest probability are assigned 0 and 1 . This part of the step is referred 
to as the splitting stage. 

These two source symbols are then combined into a new source symbol with 
probability equal to the sum of the two original probabilities. (The list of source 
symbols, and, therefore, source statistics, is thereby reduced in size by one.) The 
probability of the new symbol is placed in the list in accordance with its value. 

The procedure is repeated until we are left with a final list of source statistics 
(symbols) of only two for which the symbols 0 and 1 are assigned. 
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The code for each (original) source is found by working backward and tracing the 
sequence of Os and Is assigned to that symbol as well as its successors. 

Huffman Tree 

To illustrate the construction of a Huffman code, consider the five symbols of the alphabet 
of a discrete memoryless source and their probabilities, which are shown in the two 
leftmost columns of Figure 5.5b. Following through the Huffman algorithm, we reach the 
end of the computation in four steps, resulting in a Huffman tree similar to that shown in 
Figure 5.5; the Huffman tree is not to be confused with the decision tree discussed 
previously in Figure 5.4. The codewords of the Huffman code for the source are tabulated 
in Figure 5.5a. The average codeword length is, therefore, 

L = 0.4(2) + 0.2(2) + 0.2(2) + 0.1(3) + 0.1(3) 

= 2.2 binary symbols 

The entropy of the specified discrete memoryless source is calculated as follows (see (5.9)): 

H(S) = 0.4 108 ,( 55 )+ 0.2 ^ 2 ( 55 ) + 0 - 2 1082 ( 55 ) + 0 - 1 lo 45T) +ftl log 2 (oT) 

= 0.529 + 0.464 + 0.464 + 0.332 + 0.332 
= 2.121 bits 

For this example, we may make two observations: 

The average codeword length L exceeds the entropy IKS) by only 3.67%. 

The average codeword length L does indeed satisfy (5.23). 
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(a) Example of the Huffman encoding algorithm, (b) Source code. 


It is noteworthy that the Huffman encoding process (i.e., the Huffman tree) is not unique. 
In particular, we may cite two variations in the process that are responsible for the 
nonuniqueness of the Huffman code. First, at each splitting stage in the construction of a 
Huffman code, there is arbitrariness in the way the symbols 0 and 1 are assigned to the last 
two source symbols. Whichever way the assignments are made, however, the resulting 
differences are trivial. Second, ambiguity arises when the probability of a combined 
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symbol (obtained by adding the last two probabilities pertinent to a particular step) is 
found to equal another probability in the list. We may proceed by placing the probability 
of the new symbol as high as possible, as in Example 4. Alternatively, we may place it as 
low as possible. (It is presumed that whichever way the placement is made, high or low, it 
is consistently adhered to throughout the encoding process.) By this time, noticeable 
differences arise in that the codewords in the resulting source code can have different 
lengths. Nevertheless, the average codeword length remains the same. 

As a measure of the variability in codeword lengths of a source code, we define the 
variance of the average codeword length L over the ensemble of source symbols as 

2 _ 2 
° = X p ^ l k~ L ) 

k = 0 

where p 0 , p\, ..., p K _ ] are the source statistics and If, is the length of the codeword 
assigned to source symbol Sf,. It is usually found that when a combined symbol is moved 
as high as possible, the resulting Huffman code has a significantly smaller variance a~ 
than when it is moved as low as possible. On this basis, it is reasonable to choose the 
former Huffman code over the latter. 


A drawback of the Huffman code is that it requires knowledge of a probabilistic model of 
the source; unfortunately, in practice, source statistics are not always known a priori. 
Moreover, in the modeling of text we find that storage requirements prevent the Huffman 
code from capturing the higher-order relationships between words and phrases because the 
codebook grows exponentially fast in the size of each super-symbol of letters (i.e., 
grouping of letters); the efficiency of the code is therefore compromised. To overcome 
these practical limitations of Huffman codes, we may use the Lempel-Ziv algorithm , 
which is intrinsically adaptive and simpler to implement than Huffman coding. 

Basically, the idea behind encoding in the Lempel-Ziv algorithm is described as 
follows; 


To illustrate this simple yet elegant idea, consider the example of the binary sequence 

000101110010100101 ... 

It is assumed that the binary symbols 0 and 1 are already stored in that order in the code 
book. We thus write 

Subsequences stored: 0, 1 

Data to be parsed: 000101110010100101 ... 

The encoding process begins at the left. With symbols 0 and 1 already stored, the shortest 
subsequence of the data stream encountered for the first time and not seen before is 00; so 
we write 

Subsequences stored: 0,1,00 

Data to be parsed: 0101 1 10010100101 . . . 
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The second shortest subsequence not seen before is 01; accordingly, we go on to write 

Subsequences stored: 0, 0, 00, 01 

Data to be parsed: 01110010100101 ... 

The next shortest subsequence not encountered previously is 01 1 ; hence, we write 

Subsequences stored: 0, 1, 00, 01,011 

Data to be parsed: 10010100101... 

We continue in the manner described here until the given data stream has been completely 
parsed. Thus, for the example at hand, we get the code book of binary subsequences 
shown in the second row of Figure 5.6. 

The first row shown in this figure merely indicates the numerical positions of the 
individual subsequences in the code book. We now recognize that the first subsequence of 
the data stream, 00, is made up of the concatenation of the first code book entry, 0, with 
itself; it is, therefore, represented by the number 11. The second subsequence of the data 
stream, 01, consists of the first code book entry, 0, concatenated with the second code book 
entry, 1; it is, therefore, represented by the number 12. The remaining subsequences are 
treated in a similar fashion. The complete set of numerical representations for the various 
subsequences in the code book is shown in the third row of Figure 5.6. As a further example 
illustrating the composition of this row, we note that the subsequence 010 consists of the 
concatenation of the subsequence 01 in position 4 and symbol 0 in position 1; hence, the 
numerical representation is 41. The last row shown in Figure 5.6 is the binary encoded 
representation of the different subsequences of the data stream. 

The last symbol of each subsequence in the code book (i.e., the second row of Figure 
5.6) is an innovation symbol , which is so called in recognition of the fact that its 
appendage to a particular subsequence distinguishes it from all previous subsequences 
stored in the code book. Correspondingly, the last bit of each uniform block of bits in the 
binary encoded representation of the data stream (i.e., the fourth row in Figure 5.6) 
represents the innovation symbol for the particular subsequence under consideration. The 
remaining bits provide the equivalent binary representation of the “pointer” to the root 
subsequence that matches the one in question, except for the innovation symbol. 

The Lempel-Ziv decoder is just as simple as the encoder. Specifically, it uses the 
pointer to identify the root subsequence and then appends the innovation symbol. 
Consider, for example, the binary encoded block 1 101 in position 9. The last bit, 1, is the 
innovation symbol. The remaining bits, 110, point to the root subsequence 10 in position 
6. Hence, the block 1101 is decoded into 101, which is correct. 

From the example described here, we note that, in contrast to Huffman coding, the 
Lempel-Ziv algorithm uses fixed-length codes to represent a variable number of source 
symbols; this feature makes the Lempel-Ziv code suitable for synchronous transmission. 
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Illustrating the encoding process performed by the Lempel-Ziv algorithm 
on the binary sequence 0001011 10010100101 . . . 
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1 O 

In practice, fixed blocks of 12 bits long are used, which implies a code book of 2 = 4096 

entries. 

For a long time, Fluffman coding was unchallenged as the algorithm of choice for 
lossless data compression; Huffman coding is still optimal, but in practice it is hard to 
implement. It is on account of practical implementation that the Lempel-Ziv algorithm 
has taken over almost completely from the Huffman algorithm. The Lempel-Ziv 
algorithm is now the standard algorithm for file compression. 

Discrete Memoryless Channels 


Up to this point in the chapter we have been preoccupied with discrete memoryless 
sources responsible for information generation. We next consider the related issue of 
information transmission. To this end, we start the discussion by considering a discrete 
memoryless channel, the counterpart of a discrete memoryless source. 

A discrete memoryless channel is a statistical model with an input X and an output Y that 
is a noisy version of X; both X and Y are random variables. Every unit of time, the channel 
accepts an input symbol X selected from an alphabet % and, in response, it emits an output 
symbol Y from an alphabet °H. The channel is said to be “discrete” when both of the alphabets 
9? and have finite sizes. It is said to be “memoryless” when the current output symbol 
depends only on the current input symbol and not any previous or future symbol. 

Figure 5.7a shows a view of a discrete memoryless channel. The channel is described in 
terms of an input alphabet 

% = t) 

and an output alphabet 

a U= {To^I’ 


r x o 


ro 







■ 


yjr-l 


(a) 



(a) Discrete memoryless channel; (b) Simplified 
graphical representation of the channel. 
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The cardinality of the alphabets 3? and % or any other alphabet for that matter, is defined 
as the number of elements in the alphabet. Moreover, the channel is characterized by a set 
of transition probabilities 

P(y k \Xj ) = P(T = y k \x = Xj) for ally and k 
for which, according to probability theory, we naturally have 


0 < p(y^Xj) < 1 for all j and k 

and 

Yj p{y k\ x P = 1 for fixed y 
k 


When the number of input symbols J and the number of output symbols K are not large, we 
may depict the discrete memoryless channel graphically in another way, as shown in Figure 
5.7b. In this latter depiction, each input-output symbol pair (x, y), characterized by the 
transition probability y>(y|x) > 0, is joined together by a line labeled with the number p(y\x). 

Also, the input alphabet and output alphabet % need not have the same size; hence 
the use of J for the size of T and K for the size of °U. For example, in channel coding, the 
size K of the output alphabet r, )l may be larger than the size J of the input alphabet ffl; thus, 
K>J. On the other hand, we may have a situation in which the channel emits the same 
symbol when either one of two input symbols is sent, in which case we have K<J. 

A convenient way of describing a discrete memoryless channel is to arrange the various 
transition probabilities of the channel in the form of a matrix 



p(yo\ x o) 

p()j •%) • 

■ p(y K - t|*o) 

p = 

p(y 0 |*t) 

p(y\ w) • 

H . 

l 

X 


p{y Q \x J _ l ) p{y x \xj_ l ) . 

• p(y K -i\ x j-i) 


The J-hy-K matrix P is called the channel matrix, or stochastic matrix. Note that each row 
of the channel matrix P corresponds to a fixed channel input, whereas each column of the 
matrix corresponds to a. fixed channel output. Note also that a fundamental property of the 
channel matrix P, as defined here, is that the sum of the elements along any row of the 
stochastic matrix is always equal to one, according to (5.35). 

Suppose now that the inputs to a discrete memoryless channel are selected according to 
the probability distribution {p(xj), j = 0, 1, . . . , .7 - 1 } . In other words, the event that the 
channel input X = xj occurs with probability 

p(xj) = P(X = Xj) for j = 0, 1, ...,/-l 

Flaving specified the random variable X denoting the channel input, we may now specify 
the second random variable Y denoting the channel output. The joint probability 
distribution of the random variables X and Y is given by 

P(xj, y k ) = P(.X = xj, Y = y k ) 

= P(Y = y k \X = Xj )P(X = Xj ) 

= p(.y k \ x j)p(Xj) 
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The marginal probability distribution of the output random variable Y is obtained by 
averaging out the dependence of p(xj, y k ) on x p obtaining 

P(y k ) = P(Y = y k ) 

j - 1 

= '£P(Y = y k \X = x j )P(X = x j ) 

7 = 0 
7 - 1 

= ^ P(y k \ x j)p(Xj) for k = 0, 1, K- 1 

7 = 0 

The probabilities p(xj) for j = 0, 1, 1, are known as the prior probabilities of the 

various input symbols. Equation (5.39) states: 


Binary Symmetric Channel 

The binary symmetric channel is of theoretical interest and practical importance. It is a 
special case of the discrete memoryless channel with J = K =2. The channel has two input 
symbols ( xq = 0, X\ = 1) and two output symbols (yo = 0, yj = 1). The channel is symmetric 
because the probability of receiving 1 if 0 is sent is the same as the probability of receiving 
0 if 1 is sent. This conditional probability of error is denoted by p (i.e., the probability of a 
bit flipping). The transition probability diagram of a binary symmetric channel is as 
shown in Figure 5.8. Correspondingly, we may express the stochastic matrix as 


P = 


1-/7 


P 


P 

1 -P 
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Mutual Information 


Given that we think of the channel output Y (selected from alphabet as a noisy version 
of the channel input X (selected from alphabet §T) and that the entropy H(X) is a measure of 
the prior uncertainty about X, how can we measure the uncertainty about X after observing 
F? To answer this basic question, we extend the ideas developed in Section 5.2 by defining 
the conditional entropy of X selected from alphabet % given Y = y k . Specifically, we write 

m x\Y = y,) = JjP( Xj \y k )\og 2 (^y 

This quantity is itself a random variable that takes on the values H(X\ V = y 0 ), 
H(X\Y = Yk-i) with probabilities /;( vq ), p(}’/( i). respectively. The expectation of 

entropy H(X\ Y = y k ) over the output alphabet % is therefore given by 

K - 1 

H(X\Y) = £ H(X\Y = y k )p(y k ) 
k = o 

= X I> (x / 1^(^) log 2(^Tg) 

k = 0j = 0 Jr k ' 

K ~} J ~ ] ( 1 v 

k = 0j = 0 Jr k 

where, in the last line, we used the definition of the probability of the joint event (X = Xj, 
Y = y/ ( ) as shown by 

P(xp y k) = p(Xj\y k )p(y k ) 

The quantity H(X\ Y) in (5.41) is called the conditional entropy, formally defined as 
follows: 


The conditional entropy H(X\Y) relates the channel output Y to the channel input X. The 
entropy H(X) defines the entropy of the channel input X by itself. Given these two 
entropies, we now introduce the definition 

f(X;Y) = H(X) - H(X\ Y) 

which is called the mutual information of the channel. To add meaning to this new 
concept, we recognize that the entropy H(X ) accounts for the uncertainty about the 
channel input before observing the channel output and the conditional entropy H(X\Y) 
accounts for the uncertainty about the channel input after observing the channel output. 
We may, therefore, go on to make the statement: 


Mutual Information 


227 


Equation (5.43) is not the only way of defining the mutual information of a channel. 
Rather, we may define it in another way, as shown by 

I(Y\X) = H(Y)-H(Y\X) 
on the basis of which we may make the next statement: 

On first sight, the two definitions of (5.43) and (5.44) look different. In reality, however, 
they embody equivalent statements on the mutual information of the channel that are 
worded differently. More specifically, they could be used interchangeably, as 
demonstrated next. 

Symmetry 

The mutual information of a channel is symmetric in the sense that 


To prove this property, we first use the formula for entropy and then use (5.35) and (5.38), 
in that order, obtaining 


where, in going from the third to the final line, we made use of the definition of a joint 
probability. Hence, substituting (5.41) and (5.46) into (5.43) and then combining terms, 
we obtain 


I(X;Y) = I(Y;X ) 







Note that the double summation on the right-hand side of (5.47) is invariant with respect 
to swapping the x and y. In other words, the symmetry of the mutual information I(X;Y) is 
already evident from (5.47). 
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To further confirm this property, we may use Bayes ’ rule for conditional probabilities, 
previously discussed in Chapter 3, to write 

p( x j\yk) = p(yk\ x j ) 

p(xj) p(y k ) 

Hence, substituting (5.48) into (5.47) and interchanging the order of summation, we get 


I(X;Y) 


£-1 7-1 

X X p(x r log 2 


k = 0 f=0 


' p(y k \Xj? 
, p^k) , 


= I(Y-X) 

which proves Property 1 . 


Nonnegativity 


The mutual information is always nonnegative; that is; 

I(X;Y) > 0 

To prove this property, we first note from (5.42) that 


p( x j\yd 


P(xj, y k ) 

p(yj 


Hence, substituting (5.51) into (5.47), we may express the mutual information of the 
channel as 


p(Xj)p(y k )J 


7-1 K-\ 

I(X-Y) = X X p(Xj, y k ) log, 

7 = 0 k = 0 

Next, a direct application of the fundamental inequality of (5.12) on relative entropy 
confirms (5.50), with equality if, and only if, 

p(Xj, y k ) = p(Xj)p(y k ) for all j and k 

In words. Property 2 states the following: 


Moreover, the mutual information is zero if, and only if, the input and output symbols of 
the channel are statistically independent; that is, when (5.53) is satisfied. 

Expansion of the Mutual Information 

The mutual information of a channel is related to the joint entropy of the channel input 
and channel output by 

I(X-Y) = H(X) + H( Y) - H(X, Y) 
where the joint entropy H(X, Y) is defined by 
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To prove (5.54), we first rewrite the joint entropy in the equivalent form 


H(X,Y) 


7- 1 K - 1 

I I p{Xj, y k ) log 2 


7 = 0 k = 0 


p(Xj)p(y k ) \ 

p(Xj,y k ) ' 


7- 1 K 1 

+ Z Z^ y *> log : 

7 = 0 i = 0 


p(Xj)p(y k y 


The first double summation term on the right-hand side of (5.56) is recognized as the 
negative of the mutual information of the channel, f(X\Y), previously given in (5.52). As 
for the second summation term, we manipulate it as follows: 


7-1 K - 1 


X Z p(x :r ^ )log 

7=0 k = 0 


p(x j )p(y k ) 



K- 1 


Z p(x r 

k = 0 


+ 



7- 1 

Z P(X P y k ) 


7 = 0 


= S‘r(^)io8 2 (^) 

7 = 0 y 

= H(X) + H(Y) 


+ 


Z Pbk* lo g 2 ( 


k = 0 



where, in the first line, we made use of the following relationship from probability theory: 

K- 1 

X p(*j> y k) = p(yk> 

k = 0 

and a similar relationship holds for the second line of the equation. 

Accordingly, using (5.52) and (5.57) in (5.56), we get the result 

H(X, Y) = - I(X\Y) + H(X) + H(Y) 

which, on rearrangement, proves Property 3. 

We conclude our discussion of the mutual information of a channel by providing a 
diagramatic interpretation in Figure 5.9 of (5.43), (5.44), and (5.54). 


H(X, Y)=H(Y,X) 


H(X) 

H(Y\X) 


H(X\Y) 

I(.X]Y) 

H(Y\X) 


H(X\Y) 

H(Y) 


Illustrating the relations among various channel entropies. 
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Channel Capacity 


The concept of entropy introduced in Section 5.2 prepared us for formulating Shannon’s 
first theorem: the source-coding theorem. To set the stage for formulating Shannon’s 
second theorem, namely the channel-coding theorem, this section introduces the concept of 
capacity, which, as mentioned previously, defines the intrinsic ability of a communication 
channel to convey information. 

To proceed, consider a discrete memoryless channel with input alphabet SC output 
alphabet Si, and transition probabilities p(y k \xj), where j =0, 1, ...,7-1 and k = 0, 1, ..., 
K - 1. The mutual information of the channel is defined by the first line of (5.49), which is 
reproduced here for convenience: 


I(X-Y) 


K - 1 7- 1 

X X p(.Xj, y k ) log 2 ; 


k=0 j = 0 


^P(y k \ x jf 

, piyO , 


where, according to (5.38), 

P(xj, y k ) = P(y k \ x j)p( x j ) 

Also, from (5.39), we have 


/-l 

P(y k ) = Yj p(y k \ x j)p( x j) 

j = 0 


Putting these three equations into a single equation, we write 


K - 1 7-1 

I(X-Y) = Y X x j) p ( x fi log 2 

k=0 j=0 


p(y k \ x j ) 


j - 1 

Y p(y k \ x M x j ) 


7 = 0 


Careful examination of the double summation in this equation reveals two different 
probabilities, on which the essence of mutual information I{X;Y) depends: 

• the probability distribution {p(xj) } J . _ ^ that characterizes the channel input and 


the transition probability distribution {p(y k \ x j)Y- 
channel itself. 


j = 7- 1, K~ 1 


: 0, it = 0 


that characterizes the 


These two probability distributions are obviously independent of each other. Thus, given a 
channel characterized by the transition probability distribution fpCVytl-rj}, we may now 
introduce the channel capacity, which is formally defined in terms of the mutual 
information between the channel input and output as follows: 


C = max I(X:Y) bits per channel use 
iP(Xj)} 

The maximization in (5.59) is performed, subject to two input probabilistic constraints: 

p(xj) > 0 for all j 
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and 

X p(x j } = 1 

7-0 

Accordingly, we make the following statement: 


The channel capacity is clearly an intrinsic property of the channel. 

Binary Symmetric Channel (Revisited) 

Consider again the binary symmetric channel, which is described by the transition 
probability diagram of Figure 5.8. This diagram is uniquely defined by the conditional 
probability of error p. 

From Example 1 we recall that the entropy H(X) is maximized when the channel input 
probability p{xf) = p(x\ ) = 1/2, where Xq and are each 0 or 1. Hence, invoking the 
defining equation (5.59), we find that the mutual information I(X;Y) is similarly 
maximized and thus write 

C = W)U 0 ) =;!(,,) = 1/2 

From Figure 5.8 we have 

p(y 0 \ x O = p6t|%) = p 

and 

/ ? (>’o| x o) = p(yi | x i) = 1 -p 

Therefore, substituting these channel transition probabilities into (5.49) with J = K = 2 and 
then setting the input probability p(xf) = p(x{) = 1/2 in (5.59), we find that the capacity of 
the binary symmetric channel is 

C= l+plog 2 p + (l-p)log 2 (l-p) 

Moreover, using the definition of the entropy function introduced in (5.16), we may reduce 
(5.60) to 

C = 1 -H(p) 

The channel capacity C varies with the probability of error (i.e., transition probability) p in 
a convex manner as shown in Figure 5.10, which is symmetric about p = 1/2. Comparing 
the curve in this figure with that in Figure 5.2, we make two observations: 

When the channel is noise free , permitting us to set p = 0, the channel capacity C 
attains its maximum value of one bit per channel use, which is exactly the 
information in each channel input. At this value of p, the entropy function Hip) 
attains its minimum value of zero. 

When the conditional probability of error p = 1/2 due to channel noise, the channel 
capacity C attains its minimum value of zero, whereas the entropy function Hip) 
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Variation of channel capacity of a 
binary symmetric channel with 
transition probability p. 



Transition probability p 


attains its maximum value of unity; in such a case, the channel is said to be useless in 
the sense that the channel input and output assume statistically independent structures. 


Channel-coding Theorem 


With the entropy of a discrete memoryless source and the corresponding capacity of a 
discrete memoryless channel at hand, we are now equipped with the concepts needed for 
formulating Shannon’s second theorem: the channel-coding theorem. 

To this end, we first recognize that the inevitable presence of noise in a channel causes 
discrepancies (errors) between the output and input data sequences of a digital 
communication system. For a relatively noisy channel (e.g., wireless communication 
channel), the probability of error may reach a value as high as 10 1 , which means that (on the 
average) only 9 out of 10 transmitted bits are received correctly. For many applications, this 
level of reliability is utterly unacceptable. Indeed, a probability of error equal to 1 0 6 or even 
lower is often a necessary practical requirement. To achieve such a high level of 
performance, we resort to the use of channel coding. 

The design goal of channel coding is to increase the resistance of a digital communication 
system to channel noise. Specifically, channel coding consists of mapping the incoming data 
sequence into a channel input sequence and inverse mapping the channel output sequence 
into an output data sequence in such a way that the overall effect of channel noise on the 
system is minimized. The first mapping operation is performed in the transmitter by a 
channel encoder, whereas the inverse mapping operation is performed in the receiver by a 
channel decoder, as shown in the block diagram of Figure 5.11; to simplify the exposition, 
we have not included source encoding (before channel encoding) and source decoding (after 
channel decoding) in this figure. 


Block diagram of digital 
communication system. 



Noise 
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The channel encoder and channel decoder in Figure 5.11 are both under the designer’s 
control and should be designed to optimize the overall reliability of the communication 
system. The approach taken is to introduce redundancy in the channel encoder in a 
controlled manner, so as to reconstruct the original source sequence as accurately as 
possible. In a rather loose sense, we may thus view channel coding as the dual of source 
coding, in that the former introduces controlled redundancy to improve reliability whereas 
the latter reduces redundancy to improve efficiency. 

Treatment of the channel-coding techniques is deferred to Chapter 10. For the purpose 
of our present discussion, it suffices to confine our attention to block codes. In this class of 
codes, the message sequence is subdivided into sequential blocks each k bits long, and 
each A'-bit block is mapped into an «-bit block, where n > k. The number of redundant bits 
added by the encoder to each transmitted block is n-k bits. The ratio k/n is called the code 
rate. Using r to denote the code rate, we write 



n 


where, of course, r is less than unity. For a prescribed k, the code rate r (and, therefore, the 
system’s coding efficiency) approaches zero as the block length n approaches infinity. 

The accurate reconstruction of the original source sequence at the destination requires 
that the average probability of symbol error be arbitrarily low. This raises the following 
important question: 


The answer to this fundamental question is an emphatic “yes.” Indeed, the answer to the 
question is provided by Shannon’s second theorem in terms of the channel capacity C, as 
described in what follows. 

Up until this point, time has not played an important role in our discussion of channel 
capacity. Suppose then the discrete memoryless source in Figure 5.11 has the source 
alphabet if and entropy IKS) bits per source symbol. We assume that the source emits 
symbols once every T s seconds. Hence, the average information rate of the source is H(S)/T s 
bits per second. The decoder delivers decoded symbols to the destination from the source 
alphabet S and at the same source rate of one symbol every T s seconds. The discrete 
memoryless channel has a channel capacity equal to C bits per use of the channel. We 
assume that the channel is capable of being used once every T c seconds. Hence, the 
channel capacity per unit time is C/T c bits per second, which represents the maximum rate 
of information transfer over the channel. With this background, we are now ready to state 
Shannon’s second theorem, the channel-coding theorem, in two parts as follows: 

Let a discrete memoryless source with an alphabet if have entropy H(S) for random 
variable S and produce symbols once every T s seconds. Let a discrete memoryless 
channel have capacity C and be used once every T c seconds. Then, if 
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there exists a coding scheme for which the source output can be transmitted over the 
channel and be reconstructed with an arbitrarily small probability of error. The 
parameter C/T c is called the critical rate', when (5.62) is satisfied with the equality 
sign, the system is said to be signaling at the critical rate. 

Conversely, if 

H(S) C 
T T 

S C 

it is not possible to transmit information over the channel and reconstruct it with an 
arbitrarily small probability of error. 

The channel-coding theorem is the single most important result of information theory. The 
theorem specifies the channel capacity C as a fundamental limit on the rate at which the 
transmission of reliable error-free messages can take place over a discrete memoryless 
channel. However, it is important to note two limitations of the theorem: 

The channel-coding theorem does not show us how to construct a good code. Rather, 
the theorem should be viewed as an existence proof in the sense that it tells us that if 
the condition of (5.62) is satisfied, then good codes do exist. Later, in Chapter 10, we 
describe good codes for discrete memoryless channels. 

The theorem does not have a precise result for the probability of symbol error after 
decoding the channel output. Rather, it tells us that the probability of symbol error 
tends to zero as the length of the code increases, again provided that the condition of 
(5.62) is satisfied. 


Consider a discrete memoryless source that emits equally likely binary symbols (0s and 
Is) once every T s seconds. With the source entropy equal to one bit per source symbol (see 
Example 1), the information rate of the source is (1/T S ) bits per second. The source 
sequence is applied to a channel encoder with code rate r. The channel encoder produces a 
symbol once every T c seconds. Hence, the encoded symbol transmission rate is (1/7).) 
symbols per second. The channel encoder engages a binary symmetric channel once every 
T c seconds. Hence, the channel capacity per unit time is ( CIT c ) bits per second, where C is 
determined by the prescribed channel transition probability p in accordance with (5.60). 
Accordingly, part (1) of the channel-coding theorem implies that if 



then the probability of error can be made arbitrarily low by the use of a suitable channel- 
encoding scheme. But the ratio T c /T s equals the code rate of the channel encoder: 



Hence, we may restate the condition of (5.63) simply as 


r < C 
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That is, for r < C, there exists a code (with code rate less than or equal to channel capacity 
C) capable of achieving an arbitrarily low probability of error. 


Repetition Code 

In this example we present a graphical interpretation of the channel-coding theorem. We also 
bring out a surprising aspect of the theorem by taking a look at a simple coding scheme. 

Consider first a binary symmetric channel with transition probability p = 10 ”. For this 
value of p, we find from (5.60) that the channel capacity C = 0.9192. Hence, from the 
channel-coding theorem, we may state that, for any s> 0 and r < 0.9192, there exists a 
code of large enough length n, code rate r, and an appropriate decoding algorithm such 
that, when the coded bit stream is sent over the given channel, the average probability of 
channel decoding error is less than s. This result is depicted in Figure 5.12 for the limiting 
value s= 10 . 

To put the significance of this result in perspective, consider next a simple coding 
scheme that involves the use of a repetition code , in which each bit of the message is 
repeated several times. Let each bit (0 or 1) be repeated n times, where n = 2m + 1 is an 
odd integer. For example, for n = 3, we transmit 0 and 1 as 000 and 111, respectively. 


1.0 


10 “ 



10 


Limiting value 
e = 10“ 8 


10“ 8 _ 



: 

capacity C 


1.0 


Code rate, r 


Illustrating the significance of the channel-coding theorem. 
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Intuitively, it would seem logical to use a majority rule for decoding, which operates as 
follows: 


Hence, an error occurs when m + 1 or more bits out of n = 2m + 1 bits are received 
incorrectly. Because of the assumed symmetric nature of the channel, the average 
probability of error, denoted by P e , is independent of the prior probabilities of 0 and 1 . 
Accordingly, we find that P e is given by 


^e= X 

i = m + 1 


f n 


. p\i-p) 


where p is the transition probability of the channel. 

Table 5.3 gives the average probability of error P e for a repetition code that is 
calculated by using (5.65) for different values of the code rate r. The values given here 
assume the use of a binary symmetric channel with transition probability p = 10 . The 
improvement in reliability displayed in Table 5.3 is achieved at the cost of decreasing code 
rate. The results of this table are also shown plotted as the curve labeled “repetition code” 
in Figure 5.12. This curve illustrates the exchange of code rate for message reliability, 
which is a characteristic of repetition codes. 

This example highlights the unexpected result presented to us by the channel-coding 
theorem. The result is that it is not necessary to have the code rate r approach zero (as in 
the case of repetition codes) to achieve more and more reliable operation of the 
communication link. The theorem merely requires that the code rate be less than the 
channel capacity C. 


Average probability of error for repetition code 
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Differential Entropy and Mutual Information for Continuous 
Random Ensembles 


The sources and channels considered in our discussion of information-theoretic concepts 
thus far have involved ensembles of random variables that are discrete in amplitude. In this 
section, we extend these concepts to continuous random variables. The motivation for 
doing so is to pave the way for the description of another fundamental limit in information 
theory, which we take up in Section 5.10. 

Consider a continuous random variable X with the probability density function f x (x). 
By analogy with the entropy of a discrete random variable, we introduce the following 
definition: 


h(X) = J f x O) lo §2 




dx 


We refer to the new term MX) as the differential entropy of X to distinguish it from the 
ordinary or absolute entropy. We do so in recognition of the fact that, although h(X) is a 
useful mathematical quantity to know, it is not in any sense a measure of the randomness 
of X. Nevertheless, we justify the use of (5.66) in what follows. We begin by viewing the 
continuous random variable X as the limiting form of a discrete random variable that 
assumes the value x k = Mi, where k = 0, ±1, ±2, ..., and Ax approaches zero. By 
definition, the continuous random variable X assumes a value in the interval \x k , x k + Ax] 
with probability fxfxf) Ax. Hence, permitting Ax to approach zero, the ordinary entropy of 
the continuous random variable X takes the limiting form 


H(X) = lim V f x (x k ) Ax log- 
Ax-> 0 


= lim 

Ax — > 0 


k = -oo 
f 

OO 


fx(x k )Ax 


X Mx k ) log; 
V& = -oo 


fx( x k> 


Ax - log 2 Ax X f x (x k )As 

k = -oo 


= f M*) W 77-;) d * - lim l l0 §2 A ^ f fx( x k) dx ) 

J _oo Jx' X > Ax -» 0 V J _oo 7 

= h(X) - lim log. Ax 
Ax->0 

In the last line of (5.67), use has been made of (5.66) and the fact that the total area under 
the curve of the probability density function f x {x) is unity. In the limit as Ax approaches 
zero, the term -log. Ax approaches infinity. This means that the entropy of a continuous 
random variable is infinitely large. Intuitively, we would expect this to be true because a 
continuous random variable may assume a value anywhere in the interval (- 00 , 00 ) ; we 
may, therefore, encounter uncountable infinite numbers of probable outcomes. To avoid the 
problem associated with the term log 2 Ax, we adopt h(X) as a differential entropy, with the 
term -log, Ax serving merely as a reference. Moreover, since the information transmitted 
over a channel is actually the difference between two entropy terms that have a common 
reference, the information will be the same as the difference between the corresponding 
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differential entropy terms. We are, therefore, perfectly justified in using the term h(X), 
defined in (5.66), as the differential entropy of the continuous random variable X. 

When we have a continuous random vector X consisting of n random variables Xj, X 2 , 
X n , we define the differential entropy of X as the n-fold integral 


h(X) = J / X (x) log. 


./x( x )J 


dx 


where / X (x) is the joint probability density function of X. 


Uniform Distribution 

To illustrate the notion of differential entropy, consider a random variable X uniformly 
distributed over the interval (0, a). The probability density function of X is 


f x ( x ) 


1 A 

- , 0 < x < a 

a 

0, otherwise 


Applying (5.66) to this distribution, we get 

r“ 1 

h(X) = -log (a) dr 
J 0 a 

= logo 

Note that logo < 0 for a < 1. Thus, this example shows that, unlike a discrete random vari- 
able, the differential entropy of a continuous random variable can assume a negative value. 


In (5.12) we defined the relative entropy of a pair of different discrete distributions. To 
extend that definition to a pair of continuous distributions, consider the continuous random 
variables X and Y whose respective probability density functions are denoted by fx(x) and 
f Y (x ) for the same sample value (argument) x. The relative entropy of the random 
variables X and Y is defined by 

°(f Y \ | fx) = j M x ) log 2 ly^)J d * 

where fx(x) is viewed as the “reference” distribution. In a corresponding way to the 
fundamental property of (5.13), we have 

D(f Y 1 1 f x ) > 0 

Combining (5.70) and (5.71) into a single inequality, we may thus write 

/ /,(-.) log 2 (^) d«<J log d.v 
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The expression on the left-hand side of this inequality is recognized as the differential 
entropy of the random variable Y, namely h(Y). Accordingly, 

h (Y)< J f Y (x) 'og 2 (^y dx 

The next example illustrates an insightful application of (5.72). 


Gaussian Distribution 

Suppose two random variables, X and Y, are described as follows: 

2 

• the random variables X and Y have the common mean // and variance a ; 

• the random variable X is Gaussian distributed (see Section 3.9) as shown by 

,21 


f x (x) = 


1 


Jl TL i 


-exp 


(x-ju) 

2 a J 


Hence, substituting (5.73) into (5.72) and changing the base of the logarithm from 2 to 
e = 2.7183, we get 


h(Y)<- log 2 ej f Y (x) 


^ ^ ~ log ( *f 2 u a) 


2a 


d.r 


where e is the base of the natural algorithm. We now recognize the following 
characterizations of the random variable Y (given that its mean is p and its variance is a ~ ): 


r°° 

J f Y (x) 


d.v = 1 


[ (-* - M) 2 f Y (x) dx = a 

—oo 

We may, therefore, simplify (5.74) as 

h(Y) < ^log 2 (2?te cr“) 

The quantity on the right-hand side of (5.75) is, in fact, the differential entropy of the 
Gaussian random variable X: 

h(X) = ^log 2 (27tecr 2 ) 

Finally, combining (5.75) and (5.76), we may write 


h(Y)<h(X), 


J X: Gaussian random variable 
I Y: nonGaussian random variable 


where equality holds if, and only if, Y = X. 

We may now summarize the results of this important example by describing two 
entropic properties of a random variable: 


240 


Information Theory 


For any finite variance, a Gaussian random variable has the largest differential entropy 
attainable by any other random variable. 

The entropy of a Gaussian random variable is uniquely determined by its variance (i.e., 
the entropy is independent of the mean). 

Indeed, it is because of Property 1 that the Gaussian channel model is so widely used as a 
conservative model in the study of digital communication systems. 


Continuing with the information-theoretic characterization of continuous random 
variables, we may use analogy with (5.47) to define the mutual information between the 
pair of continuous random variables X and Y as follows: 


KXff) = J J f XY (x,y)log 2 


\ f x ( x \y) ~ 

_ f x ( x ) _ 


d.r dy 


where fx y(x,y) is the joint probability density function of X and Y and /^(x|y) is the 
conditional probability density function of X given Y = y. Also, by analogy with (5.45), 
(5.50), (5.43), and (5.44), we find that the mutual information between the pair of Gausian 
random variables has the following properties: 


I(X;Y) = I(Y;X) 


I(X-,Y) > 0 

I(X,Y) = h(X)-h(X\Y) 

= h{Y)-h{Y\X) 

The parameter h(X) is the differential entropy of X; likewise for h(Y). The parameter 
h(X\Y) is the conditional differential entropy of X given Y; it is defined by the double 
integral (see (5.41)) 


KX\Y) = J J f x ,r(x,y) log 


2 \j x ( x \y)~ 


dx dy 


The parameter h(Y\X) is the conditional differential entropy of Y given X: it is defined in a 
manner similar to h(X\ Y). 


Information Capacity Law 


In this section we use our knowledge of probability theory to expand Shannon’s channel- 
coding theorem, so as to formulate the information capacity for a band-limited, power- 
limited Gaussian channel, depicted in Figure 5.13. To be specific, consider a zero-mean 
stationary process X(t ) that is band- limited to B hertz. Let X^, k = l, 2, .... K, denote the 
continuous random variables obtained by uniform sampling of the process X(t) at a rate of 
2 B samples per second. The rate 2 B samples per second is the smallest permissible rate for 
a bandwidth B that would not result in a loss of information in accordance with the 
sampling theorem; this is discussed in Chapter 6. Suppose that these samples are 
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Input 

X , — 


- 0 - 


Noise 


Output 


Model of discrete-time, memoryless Gaussian channel. 


transmitted in T seconds over a noisy channel, also band-limited to B hertz. Hence, the 
total number of samples K is given by 

K = 2BT 

We refer to X k as a sample of the transmitted signal. The channel output is perturbed by 
additive white Gaussian noise (AWGN) of zero mean and power spectral density /Vq/2. 
The noise is band-limited to B hertz. Let the continuous random variables Y k , k = 1,2,..., 
K, denote the corresponding samples of the channel output, as shown by 

Y k = X k + N h k = 1,2, ...,K 

The noise sample N k in (5.84) is Gaussian with zero mean and variance 

<7 = N 0 B 

We assume that the samples Y k , k = 1,2, K, are statistically independent. 

A channel for which the noise and the received signal are as described in (5.84) and 
(5.85) is called a discrete-time, memoryless Gaussian channel, modeled as shown in 
Figure 5.13. To make meaningful statements about the channel, however, we have to 
assign a cost to each channel input. Typically, the transmitter is power limited ; therefore, it 
is reasonable to define the cost as 

E [X 2 k ]<P, k=\,2,...,K 

where P is the average transmitted power. The power-limited Gaussian channel described 
herein is not only of theoretical importance but also of practical importance, in that it 
models many communication channels, including line-of-sight radio and satellite links. 

The information capacity of the channel is defined as the maximum of the mutual 
information between the channel input X k and the channel output Y k over all distributions 
of the input X k that satisfy the power constraint of (5.86). Let I{X k ,Y k ) denote the mutual 
information between X k and Y k . We may then define the information capacity of the 
channel as 

2 

C = max I(X k \Y k ) , subject to the constraint E[X^] = P for all k 

f v ( x i 

x k 

In words, maximization of the mutual information I(X k ,Y k ) is done with respect to all prob- 
ability distributions of the channel input X k , satisfying the power constraint E [ kj. ] = P . 

The mutual information I(X k ,Y k ) can be expressed in one of the two equivalent forms 
shown in (5.81). For the purpose at hand, we use the second line of this equation to write 

I(X k -,Y k ) = h(Y k )-h(Y k \X k ) 
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Since X k and N k are independent random variables and their sum equals Y k in accordance 
with (5.84), we find that the conditional differential entropy of Y k given X k is equal to the 
differential entropy of N k , as shown by 

h(Y k \X k ) = h(N k ) 

Hence, we may rewrite (5.88) as 

I(X k -Y k ) = h(Y k )-h(N k ) 

With h(N k ) being independent of the distribution of X k , it follows that maximizing HX k \Y k ) 
in accordance with (5.87) requires maximizing the differential entropy h(Y k ). For h( Y k ) to 
be maximum, Y k has to be a Gaussian random variable. That is to say, samples of the 
channel output represent a noiselike process. Next, we observe that since N k is Gaussian 
by assumption, the sample X k of the channel input must be Gaussian too. We may 
therefore state that the maximization specified in (5.87) is attained by choosing samples of 
the channel input from a noiselike Gaussian-distributed process of average power P. 
Correspondingly, we may reformulate (5.87) as 

C = I{X k \Y k ): for Gaussian X k and E[W 2 ] = P for all k 

where the mutual information I(X k \Y k ) is defined in accordance with (5.90). 

For evaluation of the information capacity C, we now proceed in three stages: 

2 

The variance of sample Y k of the channel output equals P + a , which is a 
consequence of the fact that the random variables X and N are statistically 
independent; hence, the use of (5.76) yields the differential entropy 

h(Y k ) = ilog 2 [2tt eOP + cr 2 )] 

The variance of the noisy sample N k equals a ; hence, the use of (5.76) yields the 
differential entropy 

h(N k ) = ^log 2 [27te cr 2 ] 


Substituting (5.92) and (5.93) into (5.90), and recognizing the definition of 
information capacity given in (5.91), we get the formula: 


c = M i+ - -J 

a 


bits per channel use 


With the channel used K times for the transmission of K samples of the process X(t) in 
T seconds, we find that the information capacity per unit time is {KIT) times the result 
given in (5.94). The number K equals 2BT, as in (5.83). Accordingly, we may express the 
information capacity of the channel in the following equivalent form: 

C = B log 0 ( 1 + — — ] bits per second 

2 v n 0 bJ f 

where N 0 B is the total noise power at the channel output, defined in accordance with (5.85). 
Based on the formula of (5.95), we may now make the following statement 
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The information capacity law of (5.95) is one of the most remarkable results of 
Shannon’s information theory. In a single formula, it highlights most vividly the interplay 
among three key system parameters: channel bandwidth, average transmitted power, and 
power spectral density of channel noise. Note, however, that the dependence of 
information capacity C on channel bandwidth B is linear, whereas its dependence on 
signal-to-noise ratio P/(NqB) is logarithmic. Accordingly, we may make another insightful 
statement: 


The information capacity formula implies that, for given average transmitted power P and 
channel bandwidth B, we can transmit information at the rate of C bits per second, as 
defined in (5.95), with arbitrarily small probability of error by employing a sufficiently 
complex encoding system. It is not possible to transmit at a rate higher than C bits per 
second by any encoding system without a definite probability of error. Hence, the channel 
capacity law defines the fundamental limit on the permissible rate of error-free 
transmission for a power-limited, band-limited Gaussian channel. To approach this limit, 
however, the transmitted signal must have statistical properties approximating those of 
white Gaussian noise. 


To provide a plausible argument supporting the information capacity law, suppose that we 
use an encoding scheme that yields K codewords, one for each sample of the transmitted 
signal. Let n denote the length (i.e., the number of bits) of each codeword. It is presumed 
that the coding scheme is designed to produce an acceptably low probability of symbol 
error. Furthermore, the codewords satisfy the power constraint; that is, the average power 
contained in the transmission of each codeword with n bits is nP, where P is the average 
power per bit. 

Suppose that any codeword in the code is transmitted. The received vector of n bits is 

Gaussian distributed with a mean equal to the transmitted codeword and a variance equal 

2 2 

to na , where a is the noise variance. With a high probability, we may say that the 

/ 2 

received signal vector at the channel output lies inside a sphere of radius *Jna ; that is, 
centered on the tra nsmitted codeword. This sphere is itself contained in a larger sphere of 
radius Jn(P + a~) , where n(P + cr 2 ) is the average power of the received signal vector. 
We may thus visualize the sphere pa ckin g as portrayed in Figure 5.14. With 

I 2 

everything inside a small sphere of radius fncf assigned to the codeword on which it is 
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The sphere-packing problem. 


centered. It is therefore reasonable to say that, when this particular codeword is 
transmitted, the probability that the received signal vector will lie inside the correct 
“decoding” sphere is high. The key question is: 


To answer this question, we want to eliminate the overlap between the decoding spheres as 
depicted in Figure 5.14. Moreover, expressing the volume of an ^-dimensional sphere of 
radius r as A n t J \ where A n is a scaling factor, we may go on to make two statements: 

2 /9 

The volume of the sphere of received signal vectors is A n [n(P + cr )]" . 

The volume of the decoding sphere is A n (jia ") n . 

Accordingly, it follows that the maximum number of nonintersecting decoding spheres 
that can be packed inside the sphere of possible received signal vectors is given by 


Taking the logarithm of this result to base 2, we readily see that the maximum number of 
bits per transmission for a low probability of error is indeed as defined previously in (5.94). 

A final comment is in order: (5.94) is an idealized manifestation of Shannon’s channel- 
coding theorem, in that it provides an upper bound on the physically realizable 
information capacity of a communication channel. 

Implications of the Information Capacity Law 


Now that we have a good understanding of the information capacity law, we may go on to 
discuss its implications in the context of a Gaussian channel that is limited in both power 



= 2 


(n/2)log2(l + P/ o') 


Implications of the Information Capacity Law 


245 


and bandwidth. For the discussion to be useful, however, we need an ideal framework 
against which the performance of a practical communication system can be assessed. To 
this end, we introduce the notion of an ideal system , defined as a system that transmits 
data at a bit rate R b equal to the information capacity C. We may then express the average 
transmitted power as 

P = E h C 

where E b is the transmitted energy per bit. Accordingly, the ideal system is defined by the 
equation 


C , 

- = log, 
B 


E h r 
l + — - 
N 0 B 


Rearranging this formula, we may define the signal energy-per-bit to noise power spectral 
density ratio , E b /N 0 , in terms of the ratio C!B for the ideal system as follows: 


£b = 2 c/b -\ 
N 0 C/B 


A plot of the bandwidth efficiency R b /B versus E b /N 0 is called the bandwidth-efficiency 
diagram. A generic form of this diagram is displayed in Figure 5.15, where the curve 
labeled “capacity boundary” corresponds to the ideal system for which R b = C. 



Bandwidth-efficiency diagram. 
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Based on Figure 5.15, we can make three observations: 

For infinite channel bandwidth, the ratio E b /N 0 approaches the limiting value 


where log e stands for the natural logarithm In. The value defined in (5.100) is called 
the Shannon limit for an AWGN channel, assuming a code rate of zero. Expressed in 
decibels, the Shannon limit equals -1.6 dB. The corresponding limiting value of the 
channel capacity is obtained by letting the channel bandwidth B in (5.95) approach 
infinity, in which case we obtain 


The capacity boundary is defined by the curve for the critical bit rate R b = C. For 
any point on this boundary, we may flip a fair coin (with probability of 1/2) whether 
we have error-free transmission or not. As such, the boundary separates 
combinations of system parameters that have the potential for supporting error-free 
transmission ( R b < C ) from those for which error-free transmission is not possible 
(R b > C). The latter region is shown shaded in Figure 5.15. 

The diagram highlights potential trade-offs among three quantities: the E b /No, the 
ratio R b /B, and the probability of symbol error P e . In particular, we may view 
movement of the operating point along a horizontal line as trading P e versus E b /N b 
for a fixed R b /B. On the other hand, we may view movement of the operating point 
along a vertical line as trading P e versus R b /B for a fixed E b /N 0 . 

Capacity of Binary-Input AWGN Channel 

In this example, we investigate the capacity of an AWGN channel using encoded binary 
antipodal signaling (i.e., levels -1 and +1 for binary symbols 0 and 1, respectively). In 
particular, we address the issue of determining the minimum achievable bit error rate as a 
function of E b !N^ for varying code rate r. It is assumed that the binary symbols 0 and 1 are 
equiprobable. 

Let the random variables X and Y denote the channel input and channel output 
respectively; X is a discrete variable, whereas Y is a continuous variable. In light of the 
second line of (5.81), we may express the mutual information between the channel input 
and channel output as 


The second term, h(Y\X), is the conditional differential entropy of the channel output Y, 
given the channel input X. By virtue of (5.89) and (5.93), this term is just the entropy of a 
Gaussian distribution. Hence, using a~ to denote the variance of the channel noise, we write 



= log e 2 = 0.693 


Coo = lim c 

Z? — » oo 



I(X\Y) = h{Y)-h{Y\X) 


h(Y\X ) = ^ log 2 (27tecr 2 ) 
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Next, the first term, h(Y), is the differential entropy of the channel output Y. With the use of 
binary antipodal signaling, the probability density function of Y, given X = x, is a mixture 
of two Gaussian distributions with common variance o~ and mean values -1 and +1, as 
shown by 


fyiyp) 


1 ( exp[-(y i + l)~/2cr~] 


2 l 


J2n 


Tier 


ex P [-(?;- 1) 2 /2 cT ] 
J2na 


Hence, we may determine the differential entropy of Y using the formula 


H Y) = -J | x) log 2 Uy(yi\ x )] d >’,' 


where fy(ji \ x) is defined by (5.102). From the formulas of h(Y\X) and h(Y), it is clear that 
the mutual information is solely a function of the noise variance a". Using M{a~ ) to 
denote this functional dependence, we may thus write 

I(X-Y) = M(a) 

2 

Unfortunately, there is no closed formula that we can derive for M(cr ) because of the 
difficulty of determining h{Y). Nevertheless, the differential entropy h(Y) can be well 
approximated using Monte Carlo integration ; see Appendix E for details. 

Because symbols 0 and 1 are equiprobable, it follows that the channel capacity C is 
equal to the mutual information between X and Y. Hence, for error-free data transmission 
over the AWGN channel, the code rate r must satisfy the condition 

r < M( a") 

A robust measure of the ratio E^/Nq, is 

E b _ P P 


where P is the average transmitted power and Nq/ 2 is the two-sided power spectral density 
of the channel noise. Without loss of generality, we may set P = 1 . We may then express 
the noise variance as 


2 % 

(7 = 

2 E b r 


Substituting Equation (5.104) into (5.103) and rearranging terms, we get the desired 
relation: 


N o 2rM V) 

where M _1 (r) is the inverse of the mutual information between the channel input and 
putput, expressed as a function of the code rate r. 

Using the Monte Carlo method to estimate the differential entropy h(Y) and therefore 
M~\r), the plots of Figure 5.16 are computed. Figure 5.16a plots the minimum E b /N 0 
versus the code rate r for error-free transmission. Figure 5.16b plots the minimum 
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Binary antipodal signaling over an AWGN channel, (a) Minimum E b /N 0 versus the 
code rate r. (b) Minimum bit error rate versus E b /N 0 for varying code rate r. 


achievable bit error rate versus E b /N 0 with the code rate r as a running parameter. From 
Figure 5.16 we may draw the following conclusions: 

• For uncoded binary signaling (i.e., r = 1), an infinite E^/Nq is required for error-free 
communication, which agrees with what we know about uncoded data transmission 
over an AWGN channel. 

• The minimum E b /No, decreases with decreasing code rate r, which is intuitively 
satisfying. For example, for r = 1/2, the minimum value of F h //V (l is slightly less than 
0.2 dB. 

• As r approaches zero, the minimum E b /N (} approaches the limiting value of -1.6 dB, 
which agrees with the Shannon limit derived earlier; see (5.100). 


Information Capacity of Colored Noisy Channel 


The information capacity theorem as formulated in (5.95) applies to a band-limited white 
noise channel. In this section we extend Shannon’s information capacity law to the more 
general case of a nonwhite , or colored, noisy channel. To be specific, consider the 
channel model shown in Figure 5.17a where the transfer function of the channel is denoted 
by H(f). The channel noise n(t), which appears additively at the channel output, is 
modeled as the sample function of a stationary Gaussian process of zero mean and power 
spectral density S^(f). The requirement is twofold: 

Find the input ensemble, described by the power spectral density S xx (f), that 
maximizes the mutual information between the channel output y(t) and the channel 
input x(t), subject to the constraint that the average power of x( t) is fixed at a 
constant value P. 

Hence, determine the optimum information capacity of the channel. 


Information Capacity of Colored Noisy Channel 


249 



(a) Model of band-limited, power-limited noisy channel, (b) Equivalent 
model of the channel. 


This problem is a constrained optimization problem. To solve it, we proceed as follows: 

• Because the channel is linear, we may replace the model of Figure 5.17a with the 
equivalent model shown in Figure 5.17b. From the viewpoint of the spectral 
characteristics of the signal plus noise measured at the channel output, the two 
models of Figure 5.17 are equivalent, provided that the power spectral density of the 
noise n'(t) in Figure 5.17b is defined in terms of the power spectral density of the 
noise n(t) in Figure 5. 17a as 


S N'N'(f ) 


S NN^ 


where \ H{f) \ is the magnitude response of the channel. 


• To simplify the analysis, we use the “principle of divide and conquer” to 
approximate the continuous \H(f) \ described as a function of frequency /in the form 
of a staircase, as illustrated in Figure 5.18. Specifically, the channel is divided into a 
large number of adjoining frequency slots. The smaller we make the incremental 
frequency interval A f of each subchannel, the better this approximation is. 


The net result of these two points is that the original model of Figure 5.17a is replaced by 
the parallel combination of a finite number of subchannels, N, each of which is corrupted 
essentially by “band-limited white Gaussian noise.” 




Staircase 



Staricase approximation of an arbitrary magnitude response 
\H(f)\; only the positive frequency portion of the response is shown. 
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The A'th subchannel in the approximation to the model of Figure 5.17b is described by 
y k (t) = x k (t) + n k (t), k= 1, 2, N 
The average power of the signal component x k {t) is 


where Sxifk ) is the power spectral density of the input signal evaluated at the frequency 
f=fk- The variance of the noise component n k (t) is 

(T( = — !a/, k= 1,2, ...,1V 

\H(f k )\ 

where S^lfk) and \H(f k )\ are the noise spectral density and the channel’s magnitude response 
evaluated at the frequency /j., respectively. The information capacity of the kth subchannel is 


r 

C k = \ A f lo g 2 

^ V 



/t = 1,2, ...,N 


where the factor 1/2 accounts for the fact that A f applies to both positive and negative 
frequencies. All the N subchannels are independent of one another. Hence, the total 
capacity of the overall channel is approximately given by the summation 

N 

k = 1 


1 N 

i jr a/ iog 2 



The problem we have to address is to maximize the overall information capacity C subject 
to the constraint 

N 

^ P k = P = constant 
k = 1 


The usual procedure to solve a constrained optimization problem is to use the method of 
Lagrange multipliers (see Appendix D for a discussion of this method). To proceed with 
this optimization, we first define an objective function that incorporates both the 
information capacity C and the constraint (i.e., (5.111) and (5.112)), as shown by 



where X is the Lagrange multiplier. Next, differentiating the objective function .KP k ) with 
respect to P k and setting the result equal to zero, we obtain 

Af log 2 e 
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To satisfy this optimizing solution, we impose the following requirement: 
Pf, + cr~ = KAf for k = 1,2 , ...,N 


where K is a constant that is the same for all k. The constant K is chosen to satisfy the 
average power constraint. 

Inserting the defining values of (5.108) and (5.109) in the optimizing condition of 
(5.1 14), simplifying, and rearranging terms we get 


SxxW = K~ 


S NN^fk'> 

\mf 


k= 1,2, ...,N 


Let denote the frequency range for which the constant K satisfies the condition 


K> 


S NN^k> 

\mf 


Then, as the incremental frequency interval Af is allowed to approach zero and the 
number of subchannels N goes to infinity, we may use (5.115) to formally state that the 
power spectral density of the input ensemble that achieves the optimum information 
capacity is a nonnegative quantity defined by 


Sxx(f) 


K- 


lo. 


S NN (/) 


otherwise 


Because the average power of a random process is the total area under the curve of the 
power spectral density of the process, we may express the average power of the channel 
input x(t) as 


P = 


J /e 


K- 



I H(f)\ 2 ) 


d I 


For a prescribed P and specified S N (f) and H(f), the constant K is the solution to (5.117). 

The only thing that remains for us to do is to find the optimum information capacity. 
Substituting the optimizing solution of (5.114) into (5.111) and then using the defining 
values of (5.108) and (5.109), we obtain 


c*i]r A/io g2 ; 
k= 1 


f 

K 

v 


\m k )\^ 

S NN^fkX 


When the incremental frequency interval Af is allowed to approach zero, this equation 
takes the limiting form 



where the constant K is chosen as the solution to (5.117) for a prescribed input signal 
power P. 
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Equations (5.116) and (5.117) suggest the picture portrayed in Figure 5.19. Specifically, 
we make the following observations: 

• The appropriate input power spectral density S x (f) is described as the bottom 
regions of the function Sff)/\H(f)\ 2 that lie below the constant level K , which are 
shown shaded. 

• The input power P is defined by the total area of these shaded regions. 

The spectral-domain picture portrayed here is called the water-filling (pouring) 
interpretation , in the sense that the process by which the input power is distributed across 
the function S^(f)l \H(f)\ 2 is identical to the way in which water distributes itself in a vessel. 

Consider now the idealized case of a band-limited signal in AWGN channel of power spectral 
density N(f) = Nq/2. The transfer function H(f) is that of an ideal band-pass filter defined by 


m 


i, o</ c -f<|/|</ c + f 

0, otherwise 


where / c is the midband frequency and B is the channel bandwidth. For this special case, 
(5.117) and (5.118) reduce respectively to 


and 


P = 


2B\ 




Hence, eliminating K between these two equations, we get the standard form of Shannon’s 
capacity theorem, defined by (5.95). 


Capacity of NEXT-Dominated Channel 

Digital subscriber lines (DSLs) refer to a family of different technologies that operate 
over a closed transmission loop; they will be discussed in Chapter 8, Section 8.1 1. For the 



Water-filling interpretation of information-capacity 
theorem for a colored noisy channel. 
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present, it suffices to say that a DSL is designed to provide for data transmission between 
a user terminal (e.g., computer) and the central office of a telephone company. A major 
channel impairment that arises in the deployment of a DSL is the near-end cross-talk 
(NEXT). The power spectral density of this crosstalk may be taken as 


where S^(/) is the power spectral density of the transmitted signal and #next(/) is the 
transfer function that couples adjacent twisted pairs. The only constraint we have to satisfy 
in this example is that the power spectral density function S x (f) be nonnegative for all f. 
Substituting (5.119) into (5 . 1 1 6), we readily find that this condition is satisfied by solving 
for K as 


Finally, using this result in (5.118), we find that the capacity of the NEXT-dominated 
digital subscriber channel is given by 


where 3F 4 is the set of positive and negative frequencies for which S x (f) > 0 . 

Rate Distortion Theory 


In Section 5.3 we introduced the source-coding theorem for a discrete memoryless source, 
according to which the average codeword length must be at least as large as the source 
entropy for perfect coding (i.e., perfect representation of the source). However, in many 
practical situations there are constraints that force the coding to be imperfect, thereby 
resulting in unavoidable distortion. For example, constraints imposed by a communication 
channel may place an upper limit on the permissible code rate and, therefore, on average 
codeword length assigned to the information source. As another example, the information 
source may have a continuous amplitude as in the case of speech, and the requirement is to 
quantize the amplitude of each sample generated by the source to permit its representation 
by a codeword of finite length as in pulse-code modulation to be discussed in Chapter 6 . In 
such cases, the problem is referred to as source coding with a fidelity criterion, and the 
branch of information theory that deals with it is called rate distortion theory. Rate 
distortion theory finds applications in two types of situations: 

• Source coding where the permitted coding alphabet cannot exactly represent the 
information source, in which case we are forced to do lossy data compression. 

• Information transmission at a rate greater than channel capacity. 

Accordingly, rate distortion theory may be viewed as a natural extension of Shannon’s 
coding theorem. 


S^if) - I^nextWI S x (J) 
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Consider a discrete memoryless source defined by an M- ary alphabet % : {x,- 1 i = 1,2, . . M), 
which consists of a set of statistically independent symbols together with the associated sym- 
bol probabilities {pj\i = 1,2, M). Let R be the average code rate in bits per codeword. 

The representation codewords are taken from another alphabet ty'.iyjj = 1, 2, /V}. The 

source-coding theorem states that this second alphabet provides a perfect representation of 
the source provided that R> H, where H is the source entropy. But if we are forced to have 
R < H, then there is unavoidable distortion and, therefore, loss of information. 

Let p(xj, yj ) denote the joint probability of occurrence of source symbol x t and 
representation symbol yj. From probability theory, we have 

p(x { , yj) = p(yj\xf)p(xj) 

where p(yj | Xj) is a transition probability. Let d(x t , yj) denote a measure of the cost incurred 
in representing the source symbol x t by the symbol yy the quantity d(x r yj) is referred to as 
a single-letter distortion measure. The statistical average of d(x t , yj) over all possible 
source symbols and representation symbols is given by 


d = 


M N 


i = U = 1 


Note that the average distortion d is a nonnegative continuous function of the transition 
probabilities p(yj \ xj) that are determined by the source encoder-decoder pair. 

A conditional probability assignment p(yj\xj) is said to be D-admissible if, and only if, 
the average distortion d is less than or equal to some acceptable value D. The set of all 
D-admissible conditional probability assignments is denoted by 

% = {pb^y-d^D} 

For each set of transition probabilities, we have a mutual information 


I(X-Y) 


M N 

X X p( x i)p(yj\ x i) log 

i = ij = i 


(p(yj\Xi)} 

{ p(yj) ) 


A rate distortion function R(D) is defined as the smallest coding rate possible for which 
the average distortion is guaranteed not to exceed D. Let '.'Pp denote the set to which the 
conditional probability p(yj\xf) belongs for a prescribed D. Then, for a fixed D we write 

R(D) = min I(X\Y) 

p{yj\Xi) e % 

subject to the constraint 

N 

X p(yj\ x i) = 1 for i= i, 2, ..., m 

j= i 

The rate distortion function R{D) is measured in units of bits if the base-2 logarithm is 
used in (5.123). Intuitively, we expect the distortion D to decrease as the rate distortion 
function R(D) is increased. We may say conversely that tolerating a large distortion D 
permits the use of a smaller rate for coding and/or transmission of information. 
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Summary of rate 
distortion theory. 


Probability of 
occurrence = p ; 



Figure 5.20 summarizes the main parameters of rate distortion theory. In particular, 
given the source symbols {x,} and their probabilities {/?,}, and given a definition of the 
single-letter distortion measure d(xj,yj), the calculation of the rate distortion function R(D) 
involves finding the conditional probability assignment p(yj\xj) subject to certain 
constraints imposed on p(yj |x ( ). This is a variational problem, the solution of which is 
unfortunately not straightforward in general. 


Gaussian Source 

2 

Consider a discrete-time, memoryless Gaussian source with zero mean and variance a . 
Let x denote the value of a sample generated by such a source. Let y denote a quantized 
version of x that permits a finite representation of it. The square-error distortion 


d(x,y) = (x-y) 


2 


provides a distortion measure that is widely used for continuous alphabets. The rate 
distortion function for the Gaussian source with square-error distortion, as described 
herein, is given by 


R(D) 


5 log 

Lo, 


0 <D< a 


D> a" 


In this case, we see that R(D) —> co as I) — » 0, and R(D) = 0 for D = a . 


Set of Parallel Gaussian Sources 


Consider next a set of N independent Gaussian random variables { where X ; - has 
zero mean and variance o ~. Using the distortion measure 


iv „ 2 

d = 'y (x- l - xi) , x i = estimate of x ( - 
i = 1 

and building on the result of Example 12, we may express the rate distortion function for 
the set of parallel Gaussian sources described here as 


N 


R(D ) = £ i log 




K D i 
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where /), is itself defined by 


A , 

X < a~ 

2 

<T , 

X > of 


and the constant X is chosen so as to satisfy the condition 

N 

£Z>, = D 

i = 1 


Compared to Figure 5.19, (5.128) and (5.129) may be interpreted as a kind of “water- 
filling in reverse,” as illustrated in Figure 5.21. First, we choose a constant X and only the 
subset of random variables whose variances exceed the constant X. No bits are used to 
describe the remaining subset of random variables whose variances are less than the 
constant X. 



Source index i 


Reverse water-filling picture for a set of 
parallel Gaussian processes. 


Summary and Discussion 


In this chapter we established two fundamental limits on different aspects of a communi- 
cation system, which are embodied in the source-coding theorem and the channel-coding 
theorem. 

The source-coding theorem. Shannon’s first theorem, provides the mathematical tool 
for assessing data compaction; that is, lossless compression of data generated by a 
discrete memoryless source. The theorem teaches us that we can make the average number 
of binary code elements (bits) per source symbol as small as, but no smaller than, the 
entropy of the source measured in bits. The entropy of a source is a function of the 
probabilities of the source symbols that constitute the alphabet of the source. Since 
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entropy is a measure of uncertainty, the entropy is maximum when the associated 
probability distribution generates maximum uncertainty. 

The channel-coding theorem , Shannon’s second theorem, is both the most surprising 
and the single most important result of information theory. For a binary symmetric 
channel, the channel-coding theorem teaches us that, for any code rate r less than or equal 
to the channel capacity C, codes do exist such that the average probability of error is as 
small as we want it. A binary symmetric channel is the simplest form of a discrete 
nremoryless channel. It is symmetric, because the probability of receiving symbol 1 if 
symbol 0 is sent is the same as the probability of receiving symbol 0 if symbol 1 is sent. 
This probability, the probability that an error will occur, is termed a transition probability. 
The transition probability p is determined not only by the additive noise at the channel 
output, but also by the kind of receiver used. The value of p uniquely defines the channel 
capacity C. 

The information capacity law, an application of the channel-coding theorem, teaches us 
that there is an upper limit to the rate at which any communication system can operate 
reliably (i.e., free of errors) when the system is constrained in power. This maximum rate, 
called the information capacity, is measured in bits per second. When the system operates 
at a rate greater than the information capacity, it is condemned to a high probability of 
error, regardless of the choice of signal set used for transmission or the receiver used for 
processing the channel output. 

When the output of a source of information is compressed in a lossless manner, the 
resulting data stream usually contains redundant bits. These redundant bits can be 
removed by using a lossless algorithm such as Huffman coding or the Lempel-Ziv 
algorithm for data compaction. We may thus speak of data compression followed by data 
compaction as two constituents of the dissection of source coding, which is so called 
because it refers exclusively to the sources of information. 

We conclude this chapter on Shannon’s information theory by pointing out that, in 
many practical situations, there are constraints that force source coding to be imperfect, 
thereby resulting in unavoidable distortion. For example, constraints imposed by a 
communication channel may place an upper limit on the permissible code rate and, 
therefore, average codeword length assigned to the information source. As another 
example, the information source may have a continuous amplitude, as in the case of 
speech, and the requirement is to quantize the amplitude of each sample generated by the 
source to permit its representation by a codeword of finite length, as in pulse-code 
modulation discussed in Chapter 6. In such cases, the information-theoretic problem is 
referred to as source coding with a fidelity criterion, and the branch of information theory 
that deals with it is called rate distortion theory, which may be viewed as a natural 
extension of Shannon’s coding theorem. 


Problems 

Entropy 

Let p denote the probability of some event. Plot the amount of information gained by the occurrence 
of this event for 0 <p< 1. 
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A source emits one of four possible symbols during each signaling interval. The symbols occur with 
the probabilities 

Po = °- 4 
Pi = 0.3 
P2 = 0- 2 
p 3 = 0.f 

which sum to unity as they should. Find the amount of information gained by observing the source 
emitting each of these symbols. 

A source emits one of four symbols Sq, Sj, s 2 , and s 3 with probabilities 1/3, 1/6, 1/4 and 1/4, 
respectively. The successive symbols emitted by the source are statistically independent. Calculate 
the entropy of the source. 

Let X represent the outcome of a single roll of a fair die. What is the entropy of XI 

The sample function of a Gaussian process of zero mean and unit variance is uniformly sampled and 
then applied to a uniform quantizer having the input-output amplitude characteristic shown in 
Figure P5.5. Calculate the entropy of the quantizer output. 


Output 

1.5 

0.5 


■ Input 


- 0.5 


- 1.5 


Consider a discrete memoryless source with source alphabet S = {i 0 , ®i> • • •, S K- 1 ) and source statistics 
{/? () , p | , . . . , Pk - l ! • The nth extension of this source is another discrete memoryless source with source 
alphabet S in) = { Oq, cq, . . . , _ [ }, where M = K". Let P( <r ; ) denote the probability of cjj. 

Show that, as expected, 

M- 1 

I = i 

i = 0 

Show that 

M - 1 / 1 X 

X P(<Ti) log 2 [— L J = H(S), k = 1,2, ..., n 

j = 0 Pi k 

where is the probability of symbol and H(S) is the entropy of the original source. 

Hence, show that 


= nH(S) 
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Consider a discrete memoryless source with source alphabet S = (sq. Sp ■*2) and source statistics 
(0.7,0.15,0.15). 

Calculate the entropy of the source. 

Calculate the entropy of the second-order extension of the source. 

It may come as a surprise, but the number of bits needed to store text is much less than that required 
to store its spoken equivalent. Can you explain the reason for this statement? 

Let a discrete random variable X assume values in the set { jcj , Xj, . . . , x n } . Show that the entropy of X 
satisfies the inequality 

H(X) < log n 

and with equality if, and only if, the probability p,- = 1 In for all i. 

Lossless Data Compression 

Consider a discrete memoryless source whose alphabet consists of K equiprobable symbols. 

Explain why the use of a fixed-length code for the representation of such a source is about as 
efficient as any code can be. 

What conditions have to be satisfied by K and the codeword length for the coding efficiency to 
be 100%? 

Consider the four codes listed below: 


■So 

0 

0 

0 

00 


10 

01 

01 

01 

s 2 

110 

001 

011 

10 

s 3 

1110 

0010 

110 

110 

s 4 

1111 

0011 

111 

111 


Two of these four codes are prefix codes. Identify them and construct their individual decision 
trees. 

Apply the Kraft inequality to codes I, II, III, and IV. Discuss your results in light of those 
obtained in part a. 

Consider a sequence of letters of the English alphabet with their probabilities of occurrence 

Letter a i 1 m n o p y 

Probability 0.1 0.1 0.2 0.1 0.1 0.2 0.1 0.1 

Compute two different Huffman codes for this alphabet. In one case, move a combined symbol in 
the coding procedure as high as possible; in the second case, move it as low as possible. Hence, for 
each of the two codes, find the average codeword length and the variance of the average codeword 
length over the ensemble of letters. Comment on your results. 

A discrete memoryless source has an alphabet of seven symbols whose probabilities of occurrence 
are as described here: 

Symbol V s | Sj .V3 54 55 s^ 

Probability 0.25 0.25 0.125 0.125 0.125 0.0625 0.0625 
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Compute the Huffman code for this source, moving a “combined” symbol as high as possible. 
Explain why the computed source code has an efficiency of 100%. 

Consider a discrete memoryless source with alphabet {sq, ^ 2 ) and statistics {0.7, 0.15, 0.15} for 
its output. 

Apply the Huffman algorithm to this source. Hence, show that the average codeword length of 
the Huffman code equals 1.3 bits/symbol. 

Let the source be extended to order two. Apply the Huffman algorithm to the resulting extended 
source and show that the average codeword length of the new code equals 1.1975 bits/symbol. 
Extend the order of the extended source to three and reapply the Huffman algorithm; hence, 
calculate the average codeword length. 

Compare the average codeword length calculated in parts b and c with the entropy of the original 
source. 

Figure P5.15 shows a Huffman tree. What is the codeword for each of the symbols A, B, C, D, E, F, 
and G represented by this Huffman tree? What are their individual codeword lengths? 



A computer executes four instructions that are designated by the codewords (00, 01, 10, 11). 
Assuming that the instructions are used independently with probabilities (1/2, 1/8, 1/8, 1/4), 
calculate the percentage by which the number of bits used for the instructions may be reduced by the 
use of an optimum source code. Construct a Huffman code to realize the reduction. 

Consider the following binary sequence 

11101001100010110100 ... 

Use the Lempel-Ziv algorithm to encode this sequence, assuming that the binary symbols 0 and 1 
are already in the cookbook. 

Binary Symmetric Channel 

Consider the transition probability diagram of a binary symmetric channel shown in Figure 5.8. The 
input binary symbols 0 and 1 occur with equal probability. Find the probabilities of the binary 
symbols 0 and 1 appearing at the channel output. 

Repeat the calculation in Problem 5.18, assuming that the input binary symbols 0 and 1 occur with 
probabilities 1/4 and 3/4, respectively. 
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Mutual Information and Channel Capacity 

Consider a binary symmetric channel characterized by the transition probability p. Plot the mutual 
information of the channel as a function of pj, the a priori probability of symbol 1 at the channel 
input. Do your calculations for the transition probability p = 0, 0.1, 0.2, 0.3, 0.5. 

Revisiting (5.12), express the mutual information 1(X\Y) in terms of the relative entropy 

D(p(x,y)\\p(x)p(y)) 

Figure 5.10 depicts the variation of the channel capacity of a binary symmetric channel with the 
transition probability p. Use the results of Problem 5.19 to explain this variation. 

Consider the binary symmetric channel described in Figure 5.8. Let p 0 denote the probability of 
sending binary symbol x 0 = 0 and let pj = 1 — p 0 denote the probability of sending binary symbol 
X\ = 1. Let p denote the transition probability of the channel. 

Show that the mutual information between the channel input and channel output is given by 

UX-Y) = H(z) -H(p) 

where the two entropy functions 

H iz) = zlog 2 Q)+(l-z)log 2 (j-y 
z = PoP + O -PoX 1 -p) 

and 

w(p) = piog 2 (±) +( i-p ) iog 2 ( T -y 

Show that the value of p 0 that maximizes I(X;Y) is equal to 1/2. 

Hence, show that the channel capacity equals 

C = 1 - H(p) 

Two binary symmetric channels are connected in cascade as shown in Figure P5.24. Find the overall 
channel capacity of the cascaded connection, assuming that both channels have the same transition 
probability diagram of Figure 5.8. 



Binary 


Binary 

nput ► 

symmetric 


symmetric 


channel 1 


channel 2 


Output 


The binary erasure channel has two inputs and three outputs as described in Figure P5.25. The 
inputs are labeled 0 and 1 and the outputs are labeled 0, 1, and e. A fraction a of the incoming bits is 
erased by the channel. Find the capacity of the channel. 



Output 


1 -a 
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Consider a digital communication system that uses a repetition code for the channel encoding/decoding. 
In particular, each transmission is repeated n times, where n = 2m + 1 is an odd integer. The decoder 
operates as follows. If in a block of n received bits the number of Os exceeds the number of Is, then 
the decoder decides in favor of a 0; otherwise, it decides in favor of a 1. An error occurs when m + 1 
or more transmissions out of n = 2m + 1 are incorrect. Assume a binary symmetric channel. 

For n = 3, show that the average probability of error is given by 

P e = 3p 2 (l-p)+p 3 

where p is the transition probability of the channel. 

For n = 5, show that the average probability of error is given by 

P e = 10p 3 ( 1 -p) 2 + 5p 4 (l -p)+ p 5 

Hence, for the general case, deduce that the average probability of error is given by 

i (">'o -or' 

i = m + 1 


Let X, Y, and Z be three discrete random variables. For each value of the random variable Z, 
represented by sample z, define 

A (z) = ^J/OOpUI-lv) 

* y 

Show that the conditional entropy H(X \ Y) satisfies the inequality 


H(X\ Y) < H(z) + E[log A] 

where E is the expectation operator. 

Consider two correlated discrete random variables X and Y, each of which takes a value in the set 
{ Xj } ?_ j . Suppose that the value taken by Y is known. The requirement is to guess the value of X. Let 
P e denote the probability of error, defined by 

P e = P[X* Y] 

Show that P e is related to the conditional entropy of X given Y by the inequality 

H{X\Y)<H(P e ) + P e log(n -1) 


This inequality is known as Fano’s inequality. Flint: Use the result derived in Problem 5.27. 


In this problem we explore the convexity of the mutual information 7(X;T), involving the pair of 
discrete random variables X and Y. 

Consider a discrete memoryless channel, for which the transition probability p(y\x) is fixed for all x 
and y. Let Xj and X 2 be two input random variables, whose input probability distributions are 
respectively denoted by p(x j) and p(x 2 ). The corresponding probability distribution of X is defined 
by the convex combination 

P(x) = fliP(-Vi) + a 2 p{x 2 ) 


where a\ and a 2 are arbitrary constants. Prove the inequality 

I(X;Y) >flj/ (X^Tj) + a 2 I(X 2 ;Y 2 ) 


where Xj, X 2 , and X are the channel inputs, and Y\ , F 2 , and Y are the corresponding channel outputs. 
For the proof, you may use the following form of Jensen ’s inequality : 


2>2> l( ^) io s (m 7))- log 

r v Jr l \ j s 
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Differential Entropy 

The differential entropy of a continuous random variable X is defined by the integral of (5.66). 
Similarly, the differential entropy of a continuous random vector X is defined by the integral of 
(5.68). These two integrals may not exist. Justify this statement. 

Show that the differential entropy of a continuous random variable X is invariant to translation; that is, 

MX + c) = MX) 

for some constant c. 

Let Xj, X 2 , . . . , X„ denote the elements of a Gaussian vector X. The Xj are independent with mean nij 
and variance a~,i= 1,2, . . ., n. Show that the differential entropy of the vector X is given by 

h(X) = 2iog 2 [2jie(fftoi...ff„) ] 

where e is the base of the natural logarithm.What does h(X) reduce to if the variances are all equal? 
A continuous random variable X is constrained to a peak magnitude M\ that is, 

-M <X<M 

Show that the differential entropy of X is maximum when it is uniformly distributed, as shown by 

fl/(2M), -M<x<M 

fy(X) = < 

[O, otherwise 

Determine the maximum differential entropy of X. 

Referring to (5.75), do the following: 

2 

Verify that the differential entropy of a Gaussian random variable of mean /u and variance cf is 
given by 1/2 log2(27t e<y~) , where e is the base of the natural algorithm. 

Hence, confirm the inequality of (5.75). 

Demonstrate the properties of symmetry, nonnegativity, and expansion of the mutual information 
/(X; Y) described in Section 5.6. 

Consider the continuous random variable Y, defined by 

Y = X + N 

where the random variables X and N are statistically independent. Show that the conditional 
differential entropy of Y, given X, equals 

h(Y | X) = h(N) 

where h(N) is the differential entropy of N. 

Information Capacity Law 

A voice-grade channel of the telephone network has a bandwidth of 3.4 kHz. 

Calculate the information capacity of the telephone channel for a signal-to-noise ratio of 30 dB. 
Calculate the minimum signal-to-noise ratio required to support information transmission 
through the telephone channel at the rate of 9600 bits/s. 

Alphanumeric data are entered into a computer from a remote terminal through a voice-grade 
telephone channel. The channel has a bandwidth of 3.4 kHz and output signal-to-noise ratio of 
20 dB. The terminal has a total of 128 symbols. Assume that the symbols are equiprobable and the 
successive transmissions are statistically independent. 

Calculate the information capacity of the channel. 

Calculate the maximum symbol rate for which error-free transmission over the channel is 
possible. 
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A black-and-white television picture may be viewed as consisting of approximately 3 x 10 1 * * * 5 
elements, each of which may occupy one of 10 distinct brightness levels with equal probability. 
Assume that (1) the rate of transmission is 30 picture frames per second and (2) the signal-to-noise 
ratio is 30 dB. 

Using the information capacity law, calculate the minimum bandwidth required to support the 
transmission of the resulting video signal. 

In Section 5.10 we made the statement that it is easier to increase the information capacity of a 
communication channel by expanding its bandwidth B than increasing the transmitted power for a 
prescribed noise variance N 0 B. This statement assumes that the noise spectral density N 0 varies 
inversely with B. Why is this inverse relationship the case? 

In this problem, we revisit Example 5.10, which deals with coded binary antipodal signaling over an 
additive white Gaussian noise (AWGN) channel. Starting with (5.105) and the underlying theory, 
develop a software package for computing the minimum E b /N 0 required for a given bit error rate, 
where £ b is the signal energy per bit, and Nq/ 2 is the noise spectral density. Hence, compute the 
results plotted in parts a and b of Figure 5.16. 

As mentioned in Example 5.10, the computation of the mutual information between the channel input 
and channel output is well approximated using Monte Carlo integration. To explain how this method 
works, consider a function g(y) that is difficult to sample randomly, which is indeed the case for the 
problem at hand. (For this problem, the function g(y) represents the complicated integrand in the for- 
mula for the differential entropy of the channel output.) For the computation, proceed as follows: 

Find an area A that includes the region of interest and that is easily sampled. 

Choose N points, uniformly randomly inside the area A. 

Then the Monte Carlo integration theorem states that the integral of the function g(y) with respect to 
y is approximately equal to the area A multiplied by the fraction of points that reside below the curve 
of g, as illustrated in Figure P5.41. The accuracy of the approximation improves with increasing N. 



Notes 


1. According to Lucky (1989), the first mention of the term information theory by Shannon 

occurred in a 1945 memorandum entitled “A mathematical theory of cryptography.” It is rather 

curious that the term was never used in Shannon’s (1948) classic paper, which laid down the 

foundations of information theory. For an introductory treatment of information theory, see Part 1 of 
the book by McEliece (2004), Chapters 1-6. For an advanced treatment of this subject, viewed in a 
rather broad context and treated with rigor, and clarity of presentation, see Cover and Thomas 

(2006). 
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For a collection of papers on the development of information theory (including the 1948 classic 
paper by Shannon), see Slepian (1974). For a collection of the original papers published by Shannon, 
see Sloane and Wyner (1993). 

2. The use of a logarithmic measure of information was first suggested by Hartley (1928); however, 
Hartley used logarithms to base 10. 

3. In statistical physics, the entropy of a physical system is defined by (Rief, 1965: 147) 

L = k B ln fi 

where & B is Boltzmann 's constant , Q is the number of states accessible to the system, and In denotes 
the natural logarithm. This entropy has the dimensions of energy, because its definition involves the 
constant k B . In particular, it provides a quantitative measure of the degree of randomness of the 
system. Comparing the entropy of statistical physics with that of information theory, we see that they 
have a similar form. 

4. For the original proof of the source coding theorem, see Shannon (1948). A general proof of the 
source coding theorem is also given in Cover and Thomas (2006). The source coding theorem is also 
referred to in the literature as the noiseless coding theorem, noiseless in the sense that it establishes 
the condition for error-free encoding to be possible. 

5. For proof of the Kraft inequality, see Cover and Thomas (2006). The Kraft inequality is also 
referred to as the Kraft-McMillan inequality in the literature. 

6. The Huffman code is named after its inventor D.A. Huffman (1952). For a detailed account of 
Huffman coding and its use in data compaction, see Cover and Thomas (2006). 

7. The original papers on the Lempel-Ziv algorithm are Ziv and Lempel (1977, 1978). For detailed 
treatment of the algorithm, see Cover and Thomas (2006). 

8. It is also of interest to note that once a “parent” subsequence is joined by its two children, that 
parent subsequence can be replaced in constructing the Lempel-Ziv algorithm. To illustrate this nice 
feature of the algorithm, suppose we have the following example sequence: 

01 , 010 , 011 , ... 

where 01 plays the role of a parent and 010 and Oil play the roles of the parent’s children. In this 
example, the algorithm removes the 01, thereby reducing the length of the table through the use of a 
pointer. 

9. In Cover and Thomas (2006), it is proved that the two-stage method, where the source coding and 
channel coding are considered separately as depicted in Figure 5.11, is as good as any other method 
of transmitting information across a noisy channel. This result has practical implications, in that the 
design of a communication system may be approached in two separate parts: source coding followed 
by channel coding. Specifically, we may proceed as follows: 

Design a source code for the most efficient representation of data generated by a discrete 
memoryless source of information. 

Separately and independently, design a channel code that is appropriate for a discrete 
memoryless channel. 

The combination of source coding and channel coding designed in this manner will be as efficient as 
anything that could be designed by considering the two coding problems jointly. 

10. To prove the channel-coding theorem, Shannon used several ideas that were new at the time; 
however, it was some time later when the proof was made rigorous (Cover and Thomas, 2006: 199). 
Perhaps the most thoroughly rigorous proof of this basic theorem of information theory is presented 
in Chapter 7 of the book by Cover and Thomas (2006). Our statement of the theorem, though 
slightly different from that presented by Cover and Thomas, in essence is the same. 

11. In the literature, the relative entropy is also referred to as the Kullback-Leibler divergence (KLD). 
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12. Equation (5.95) is also referred to in the literature as the Shannon-Hartley law in recognition of 
the early work by Hartley on information transmission (Hartley, 1928). In particular. Hartley showed 
that the amount of information that can be transmitted over a given channel is proportional to the 
product of the channel bandwidth and the time of operation. 

13. A lucid exposition of sphere packing is presented in Cover and Thomas (2006); see also 
Wozencraft and Jacobs (1965). 

14. Parts a and b of Figure 5.16 follow the corresponding parts of Figure 6.2 in the book by Frey 
(1998). 

15. For a rigorous treatment of information capacity of a colored noisy channel, see Gallager 
(1968). The idea of replacing the channel model of Figure 5.17a with that of Figure 5.17b is 
discussed in Gitlin, Hayes, and Weinstein (1992) 

16. For a complete treatment of rate distortion theory, see the classic book by Berger (1971); this 
subject is also treated in somewhat less detail in Cover and Thomas (1991), McEliece (1977), and 
Gallager (1968). 

17. For the derivation of (5.124), see Cover and Thomas (2006). An algorithm for computation of 
the rate distortion function R(D) defined in (5.124) is described in Blahut (1987) and Cover and 
Thomas (2006). 


Conversion of Analog 
Waveforms into Coded Pulses 


Introduction 


In continuous-wave (CW) modulation , which was studied briefly in Chapter 2, some 
parameter of a sinusoidal carrier wave is varied continuously in accordance with the 
message signal. This is in direct contrast to pulse modulation , which we study in this 
chapter. In pulse modulation, some parameter of a pulse train is varied in accordance with 
the message signal. On this basis, we may distinguish two families of pulse modulation: 

Analog pulse modulation, in which a periodic pulse train is used as the carrier wave 
and some characteristic feature of each pulse (e.g., amplitude, duration, or position) 
is varied in a continuous manner in accordance with the corresponding sample value 
of the message signal. Thus, in analog pulse modulation, information is transmitted 
basically in analog form but the transmission takes place at discrete times. 

Digital pulse modulation, in which the message signal is represented in a form that 
is discrete in both time and amplitude, thereby permitting transmission of the 
message in digital form as a sequence of coded pulses', this form of signal 
transmission has no CW counterpart. 

The use of coded pulses for the transmission of analog information-bearing signals 
represents a basic ingredient in digital communications. In this chapter, we focus attention 
on digital pulse modulation, which, in basic terms, is described as the conversion of 
analog waveforms into coded pulses. As such, the conversion may be viewed as the 
transition from analog to digital communications. 

Three different kinds of digital pulse modulation are studied in the chapter: 

Pulse-code modulation (PCM), which has emerged as the most favored scheme for 
the digital transmission of analog information-bearing signals (e.g., voice and video 
signals). The important advantages of PCM are summarized thus: 

• robustness to channel noise and interference; 

• efficient regeneration of the coded signal along the transmission path; 

• efficient exchange of increased channel bandwidth for improved signal-to- 
quantization noise ratio, obeying an exponential law; 

• a uniform format for the transmission of different kinds of baseband signals, 
hence their integration with other forms of digital data in a common network; 
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• comparative ease with which message sources may be dropped or reinserted in a 
multiplex system; 

• secure communication through the use of special modulation schemes or 
encryption. 

These advantages, however, are attained at the cost of increased system complexity 
and increased transmission bandwidth. Simply stated: 


For every gain we make, there is a price to pay. 

Differential pulse-code modulation (DPCM), which exploits the use of lossy data 
compression to remove the redundancy inherent in a message signal, such as voice or 
video, so as to reduce the bit rate of the transmitted data without serious degradation 
in overall system response. In effect, increased system complexity is traded off for 
reduced bit rate, therefore reducing the bandwidth requirement of PCM. 

Delta modulation (DM), which addresses another practical limitation of PCM: the 
need for simplicity of implementation when it is a necessary requirement. DM 
satisfies this requirement by intentionally “oversampling” the message signal. In 
effect, increased transmission bandwidth is traded off for reduced system 
complexity. DM may therefore be viewed as the dual of DPCM. 

Although, indeed, these three methods of analog-to-digital conversion are quite different, 

they do share two basic signal-processing operations, namely sampling and quantization: 

• the process of sampling, followed by 

• pulse-amplitude modulation (PAM) and finally 

• amplitude quantization 

are studied in what follows in this order. 

Sampling Theory 


The sampling process is usually described in the time domain. As such, it is an operation 
that is basic to digital signal processing and digital communications. Through use of the 
sampling process, an analog signal is converted into a corresponding sequence of samples 
that are usually spaced uniformly in time. Clearly, for such a procedure to have practical 
utility, it is necessary that we choose the sampling rate properly in relation to the bandwidth 
of the message signal, so that the sequence of samples uniquely defines the original analog 
signal. This is the essence of the sampling theorem, which is derived in what follows. 


Consider an arbitrary signal g(t) of finite energy, which is specified for all time t. A 
segment of the signal g(t) is shown in Figure 6.1a. Suppose that we sample the signal g(t) 
instantaneously and at a uniform rate, once every T s seconds. Consequently, we obtain an 
infinite sequence of samples spaced T s seconds apart and denoted by \g(nTf}, where n 
takes on all possible integer values, positive as well as negative. We refer to T s as the 
sampling period, and to its reciprocal f s = l/T s as the sampling rate. For obvious reasons, 
this ideal form of sampling is called instantaneous sampling. 
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(a) (b) 

The sampling process, (a) Analog signal, (b) Instantaneously sampled version of the 
analog signal. 

Let g d (t) denote the signal obtained by individually weighting the elements of a 
periodic sequence of delta functions spaced T s seconds apart by the sequence of numbers 
{ g(nT s ) } , as shown by (see Figure 6.1b): 

00 

g S (0 = X S( nT s )S(t-nT s ) 

n = -oo 

We refer to ggit) as the ideal sampled signal. The term Sit - nT s ) represents a delta func- 
tion positioned at time t = nT s . From the definition of the delta function, we recall from 
Chapter 2 that such an idealized function has unit area. We may therefore view the multi- 
plying factor g(nT s ) in (6.1) as a “mass” assigned to the delta function S(t - nT s ). A delta 
function weighted in this manner is closely approximated by a rectangular pulse of dura- 
tion At and amplitude g(nT s )/Af, the smaller we make At the better the approximation will 
be. 

Referring to the table of Fourier- transform pairs in Table 2.2, we have 

00 

8 di*) -fs X G( f~ m O 

m = -oo 

where G(f) is the Fourier transform of the original signal git) and/ s is the sampling rate. 
Equation (6.2) states: 


Another useful expression for the Fourier transform of the ideal sampled signal g d it) may 
be obtained by taking the Fourier transform of both sides of (6.1) and noting that the 
Fourier transform of the delta function S(t - nT s ) is equal to ex p( —j 2 tt/z/ 7’ s ) . Letting G${f) 
denote the Fourier transform of gg(t), we may write 

00 

G s (f) = £ g(nT s )exp(-j27tn/T s ) 

n = -oo 

Equation (6.3) describes the discrete-time Fourier transform. It may be viewed as a 
complex Fourier series representation of the periodic frequency function Gg(f ), with the 
sequence of samples {g(nT s )} defining the coefficients of the expansion. 
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The discussion presented thus far applies to any continuous-time signal g(t) of finite 
energy and infinite duration. Suppose, however, that the signal git) is strictly band limited, 
with no frequency components higher than W hertz. That is, the Fourier transform Gif) of 
the signal g(t) has the property that G(f) is zero for | / | > W, as illustrated in Figure 6.2a; 
the shape of the spectrum shown in this figure is merely intended for the purpose of 
illustration. Suppose also that we choose the sampling period T s = 1/2 W. Then the 
corresponding spectrum Gg(f) of the sampled signal g$(t) is as shown in Figure 6.2b. 
Putting T s = 1/2 W in (6.3) yields 

G s(n = £ ^) ex p(- J ir) 

n = —oo 

Isolating the term on the right-hand side of (6.2), corresponding to m - 0, we readily see 
that the Fourier transform of g$(t) may also be expressed as 

00 

G b {f) = f s G(f) +f X G(f-mf s ) 

m = -oo 
m * 0 

Suppose, now, we impose the following two conditions: 

G(f) = 0 for | / 1 > W 
f s = 2W. 

We may then reduce (6.5) to 

= ~W<f<W 

Substituting (6.4) into (6.6), we may also write 

0if> = Tw £ - w<f<w 

n = -oo 

Equation (6.7) is the desired formula for the frequency-domain description of sampling. 
This formula reveals that if the sample values g(n/2W) of the signal g(t) are specified for 
all n, then the Fourier transform G(f) of that signal is uniquely determined. Because g(t) is 
related to G(f) by the inverse Fourier transform, it follows, therefore, that git) is itself 
uniquely determined by the sample values g(n/2W) for -oo <n< oo . In other words, the 
sequence {g(n/2W)} has all the information contained in the original signal git). 




(a) Spectrum of a strictly band-limited signal git), (b) Spectrum of the sampled version 
of g(t) for a sampling period T s = 1/2 W. 
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Consider next the problem of reconstructing the signal g(t) from the sequence of 
sample values {g(n/2W)}. Substituting (6.7) in the formula for the inverse Fourier 
transform 


OO 

8(t) = J 

—00 


G(/)exp(j2it/r) d / 


and interchanging the order of summation and integration, which is permissible because 
both operations are linear, we may go on to write 


8(t) = £ 


f-H 

\2WJ2W- 


w 


exp 


-w 


j2tt fit 


■— T 

2 wL 


d/ 


The definite integral in (6.8), including the multiplying factor 1/2 W, is readily evaluated in 
terms of the sine function, as shown by 



_ sin(2nWt - nn) 
2WJ] 1 ~ 2nWt-nn 


= sinc(2VFf- n) 

Accordingly, (6.8) reduces to the infinite-series expansion 


sit) = £ 


g(^) sinc (2Wr-n), 


-oo < t < oo 


Equation (6.9) is the desired reconstruction formula. This formula provides the basis for 
reconstructing the original signal g(t) from the sequence of sample values {g(ri/2W)}, with 
the sine function sinc(2VFt) playing the role of a basis function of the expansion. Each 
sample, g(n/2W), is multiplied by a delayed version of the basis function, sine (2 M - n), 
and all the resulting individual waveforms in the expansion are added to reconstruct the 
original signal g(t). 


Equipped with the frequency-domain description of sampling given in (6.7) and the 
reconstruction formula of (6.9), we may now state the sampling theorem for strictly band- 
limited signals of finite energy in two equivalent parts: 

A band-limited signal of finite energy that has no frequency components higher than 
W hertz is completely described by specifying the values of the signal instants of 
time separated by 1/2 IF seconds. 

A band-limited signal of finite energy that has no frequency components higher than 
W hertz is completely recovered from a knowledge of its samples taken at the rate of 
2 W samples per second. 

Part 1 of the theorem, following from (6.7), is performed in the transmitter. Part 2 of the 
theorem, following from (6.9), is performed in the receiver. For a signal bandwidth of 
W hertz, the sampling rate of 2 W samples per second, for a signal bandwidth of W hertz, is 
called the Nyquist rate\ its reciprocal 1/2 VF (measured in seconds) is called the Nyquist 
interval', see the classic paper (Nyquist, 1928b). 
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Derivation of the sampling theorem just described is based on the assumption that the 
signal g(t) is strictly band limited. In practice, however, a message signal is not strictly band 
limited, with the result that some degree of undersampling is encountered, as a consequence 
of which aliasing is produced by the sampling process. Aliasing refers to the phenomenon 
of a high-frequency component in the spectrum of the signal seemingly taking on the 
identity of a lower frequency in the spectrum of its sampled version, as illustrated in Figure 
6.3. The aliased spectrum, shown by the solid curve in Figure 6.3b, pertains to the 
undersampled version of the message signal represented by the spectrum of Figure 6.3a. 

To combat the effects of aliasing in practice, we may use two corrective measures: 

Prior to sampling, a low-pass anti-aliasing filter is used to attenuate those high- 
frequency components of the signal that are not essential to the information being 
conveyed by the message signal g(t). 

The filtered signal is sampled at a rate slightly higher than the Nyquist rate. 

The use of a sampling rate higher than the Nyquist rate also has the beneficial effect of 
easing the design of the reconstruction filter used to recover the original signal from its 
sampled version. Consider the example of a message signal that has been anti-alias (low- 
pass) filtered, resulting in the spectrum shown in Figure 6.4a. The corresponding spectrum 
of the instantaneously sampled version of the signal is shown in Figure 6.4b, assuming a 
sampling rate higher than the Nyquist rate. According to Figure 6.4b, we readily see that 
design of the reconstruction filter may be specified as follows: 

• The reconstruction filter is low-pass with a passband extending from -W to W, 
which is itself determined by the anti-aliasing filter. 

• The reconstruction filter has a transition band extending (for positive frequencies) 
from W to (f - IT), where f is the sampling rate. 



(a) Spectrum of a signal, (b) Spectrum of an under-sampled version 
of the signal exhibiting the aliasing phenomenon. 
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-f s -w -f s -f s + w -W 0 W f s -W f s f s + W 


(b) 



(a) Anti-alias filtered spectrum of an information-bearing signal, (b) Spectrum of 
instantaneously sampled version of the signal, assuming the use of a sampling rate greater than the 
Nyquist rate, (c) Magnitude response of reconstruction filter. 


Sampling of Voice Signals 

As an illustrative example, consider the sampling of voice signals for waveform coding. 
Typically, the frequency band, extending from 100 Hz to 3.1 kHz, is considered to be 
adequate for telephonic communication. This limited frequency band is accomplished by 
passing the voice signal through a low-pass filter with its cutoff frequency set at 3.1 kHz; 
such a filter may be viewed as an anti-aliasing filter. With such a cutoff frequency, the 
Nyquist rate is/ s = 2 x 3. 1 = 6.2 kHz. The standard sampling rate for the waveform coding 
of voice signals is 8 kHz. Putting these numbers together, design specifications for the 
reconstruction (low-pass) filter in the receiver are as follows: 

Cutoff frequency 3.1kHz 

Transition band 6.2 to 8 kHz 

Transition-bandwidth 1.8 kHz. 
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Pulse-Amplitude Modulation 


Now that we understand the essence of the sampling process, we are ready to formally 
define PAM, which is the simplest and most basic form of analog pulse modulation. It is 
formally defined as follows: 


The pulses themselves can be of a rectangular form or some other appropriate shape. 

The waveform of a PAM signal is illustrated in Figure 6.5. The dashed curve in this 
figure depicts the waveform of a message signal m(t), and the sequence of amplitude- 
modulated rectangular pulses shown as solid lines represents the corresponding PAM 
signal s(t). There are two operations involved in the generation of the PAM signal: 

Instantaneous sampling of the message signal m{t) every T s seconds, where the 
sampling rate/ s = 1 /T s is chosen in accordance with the sampling theorem. 
Lengthening the duration of each sample so obtained to some constant value T. 

In digital circuit technology, these two operations are jointly referred to as “sample and 
hold.” One important reason for intentionally lengthening the duration of each sample is to 
avoid the use of an excessive channel bandwidth, because bandwidth is inversely 
proportional to pulse duration. However, care has to be exercised in how long we make the 
sample duration T, as the following analysis reveals. 

Let s(t) denote the sequence of flat-top pulses generated in the manner described in 
Figure 6.5. We may express the PAM signal as a discrete convolution sum: 


where T s is the sampling period and m(nT s ) is the sample value of mit) obtained at time 
t = nT s . The hit) is a Fourier-transformal pulse. With spectral analysis of sit) in mind, we 
would like to recast (6.10) in the form of a convolution integral. To this end, we begin by 
invoking the sifting property of a delta function (discussed in Chapter 2) to express the 
delayed version of the pulse shape hit ) in (6.10) as 


00 


s(t) = ^ m(nT s )h(t - nT s ) 



00 


—00 



Flat-top samples, representing an analog signal. 
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Hence, substituting (6.11) into (6.10), and interchanging the order of summation and 
integration, we get 



00 

^ m(nT s )8(t - nT s ) 

n = -oo 


h(t- r) 


dr 


Referring to (6.1), we recognize that the expression inside the brackets in (6.12) is simply 
the instantaneously sampled version of the message signal m(t), as shown by 

00 

m s (t ) = ^ m(nT s )S(t-nT s ) 

n = -oo 

Accordingly, substituting (6.13) into (6.12), we may reformulate the PAM signal s(t) in the 
desired form 

r°° 

s(t) = m^t)h(t- r) dr 

—00 

= nig{t)'kh{t) 

which is the convolution of the two time functions; m s {t) and h( t). 

The stage is now set for taking the Fourier transform of both sides of (6.14) and 
recognizing that the convolution of two time functions is transformed into the 
multiplication of their respective Fourier transforms; we get the simple result 

S(f) = M s (f)H(f) 

where S(f ) = F[s(f)], Mg(f) = F[m^(f)], and H[f) = F[/i(f)]. Adapting (6.2) to the problem 
at hand, we note that the Fourier transform M$(f) is related to the Fourier transform M(f ) 
of the original message signal m(t) as follows: 

GO 

MJf) =/ s £ M(f-kf s ) 

k = -oo 

where / s is the sampling rate. Therefore, the substitution of (6.16) into (6.15) yields the 
desired formula for the Fourier transform of the PAM signal s(t), as shown by 

00 

S(f) =/ s X 

k = -oo 

Given this formula, how do we recover the original message signal m(t)l As a first step in 
this reconstruction, we may pass sit) through a low-pass filter whose frequency response is 
defined in Figure 6.4c; here, it is assumed that the message signal is limited to bandwidth 
W and the sampling rate / s is larger than the Nyquist rate 2 IV. Then, from (6.17) we find 
that the spectrum of the resulting filter output is equal to M(f)Hif). This output is 
equivalent to passing the original message signal m(t) through another low-pass filter of 
frequency response H(f). 

Equation (6.17) applies to any Fourier-transformable pulse shape h{t). 
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Consider now the special case of a rectangular pulse of unit amplitude and duration 7, 
as shown in Figure 6.6a; specifically: 


h(t) = 


1, 0 < r < 7 

t = 0,t=T 
0, otherwise 


Correspondingly, the Fourier transform of hit) is given by 
H(f) = T sine (JT) exp (— j nfT) 

which is plotted in Figure 6.6b. We therefore find from (6.17) that by using flat-top 
samples to generate a PAM signal we have introduced amplitude distortion as well as a 
delay of 7/2. This effect is rather similar to the variation in transmission with frequency 
that is caused by the finite size of the scanning aperture in television. Accordingly, the 
distortion caused by the use of PAM to transmit an analog information-bearing signal is 
referred to as the aperture effect. 

To correct for this distortion, we connect an equalizer in cascade with the low-pass 
reconstruction filter, as shown in Figure 6.7. The equalizer has the effect of decreasing the 
in-band loss of the reconstruction filter as the frequency increases in such a manner as to 




(a) Rectangular pulse h(t). (b) Transfer function H(f), made up of the magnitude \H(f)\ 
and phase arg [H(f)\. 
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PAM signal 
sM 



Message signal 
m ( t ) 


System for recovering message signal m(t) from PAM signal s(t). 


compensate for the aperture effect. In light of (6.19), the magnitude response of the 
equalizer should ideally be 

1 = 1 = tt/ 

\H(f)\ Tsinc(/T) sin (7t :fT) 

The amount of equalization needed in practice is usually small. Indeed, for a duty cycle 
defined by the ratio T/T s <0.1, the amplitude distortion is less than 0.5%. In such a 
situation, the need for equalization may be omitted altogether. 


The transmission of a PAM signal imposes rather stringent requirements on the frequency 
response of the channel, because of the relatively short duration of the transmitted pulses. 
One other point that should be noted: relying on amplitude as the parameter subject to 
modulation, the noise performance of a PAM system can never be better than baseband- 
signal transmission. Accordingly, in practice, we find that for transmission over a 
communication channel PAM is used only as the preliminary means of message 
processing, whereafter the PAM signal is changed to some other more appropriate form of 
pulse modulation. 

With analog-to-digital conversion as the aim, what would be the appropriate form of 
modulation to build on PAM? Basically, there are three potential candidates, each with its 
own advantages and disadvantages, as summarized here: 

PCM, which, as remarked previously in Section 6.1, is robust but demanding in both 
transmission bandwidth and computational requirements. Indeed, PCM has 
established itself as the standard method for the conversion of speech and video 
signals into digital form. 

DPCM, which provides a method for the reduction in transmission bandwidth but at 
the expense of increased computational complexity. 

DM, which is relatively simple to implement but requires a significant increase in 
transmission bandwidth. 

Before we go on, a comment on terminology is in order. The term “modulation” used 
herein is a misnomer. In reality, PCM, DM, and DPCM are different forms of source 
coding, with source coding being understood in the sense described in Chapter 5 on 
information theory. Nevertheless, the terminologies used to describe them have become 
embedded in the digital communications literature, so much so that we just have to live 
with them. 

Despite their basic differences, PCM, DPCM and DM do share an important feature: 
the message signal is represented in discrete form in both time and amplitude. PAM takes 
care of the discrete-time representation. As for the discrete-amplitude representation, we 
resort to a process known as quantization, which is discussed next. 
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Quantization and its Statistical Characterization 


Typically, an analog message signal (e.g., voice) has a continuous range of amplitudes 
and, therefore, its samples have a continuous amplitude range. In other words, within the 
finite amplitude range of the signal, we find an infinite number of amplitude levels. In 
actual fact, however, it is not necessary to transmit the exact amplitudes of the samples for 
the following reason: any human sense (the ear or the eye) as ultimate receiver can detect 
only finite intensity differences. This means that the message signal may be approximated 
by a signal constructed of discrete amplitudes selected on a minimum error basis from an 
available set. The existence of a finite number of discrete amplitude levels is a basic 
condition of waveform coding exemplified by PCM. Clearly, if we assign the discrete 
amplitude levels with sufficiently close spacing, then we may make the approximated 
signal practically indistinguishable from the original message signal. For a formal 
definition of amplitude quantization , or just quantization for short, we say: 


This definition assumes that the quantizer (i.e., the device performing the quantization 
process) is memoryless and instantaneous , which means that the transformation at time 
t = nT s is not affected by earlier or later samples of the message signal m(t). This simple 
form of scalar quantization, though not optimum, is commonly used in practice. 

When dealing with a memoryless quantizer, we may simplify the notation by dropping 
the time index. Henceforth, the symbol m k is used in place of m(kTj, as indicated in the 
block diagram of a quantizer shown in Figure 6.8a. Then, as shown in Figure 6.8b, the 
signal amplitude m is specified by the index k if it lies inside the partition cell 

J k :{m k <m<m k+l }, k = 1,2 

where 

m k = m(kT s ) 


and L is the total number of amplitude levels used in the quantizer. The discrete amplitudes 
m k , k = 1 , 2, . . . , L, at the quantizer input are called decision levels or decision thresholds. At 
the quantizer output, the index k is transformed into an amplitude v k that represents all ampli- 
tudes of the cell J k , the discrete amplitudes v k ,k= 1,2, . . ., L, are called representation levels 
or reconstruction levels. The spacing between two adjacent representation levels is called a 
quantum or step-size. Thus, given a quantizer denoted by g(-), the quantized output v equals 
v k if the input sample m belongs to the interval J k . In effect, the mapping (see Figure 6.8a) 

v = g(m) 

defines the quantizer characteristic, described by a staircase function. 


Continuous 
sample m 


Quantizer 

«(•) 


Discrete 
sample v 


A 


m k- 1 m k v k m k + 1 m k+ 2 


Description of a 
memoryless quantizer. 


(a) 


(b) 
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(a) 



Input 

level 


Two types of quantization: (a) midtread and (b) midrise. 


Quantizers can be of a uniform or nonuniform type. In a uniform quantizer, the 
representation levels are uniformly spaced; otherwise, the quantizer is nonuniform. In this 
section, we consider only uniform quantizers; nonuniform quantizers are considered in 
Section 6.5. The quantizer characteristic can also be of midtread or midrise type. Figure 
6.9a shows the input-output characteristic of a uniform quantizer of the midtread type, 
which is so called because the origin lies in the middle of a tread of the staircaselike graph. 
Figure 6.9b shows the corresponding input-output characteristic of a uniform quantizer of 
the midrise type, in which the origin lies in the middle of a rising part of the staircaselike 
graph. Despite their different appearances, both the midtread and midrise types of uniform 
quantizers illustrated in Figure 6.9 are symmetric about the origin. 


Inevitably, the use of quantization introduces an error defined as the difference between 
the continuous input sample m and the quantized output sample v. The error is called 
quantization noise. Figure 6.10 illustrates a typical variation of quantization noise as a 
function of time, assuming the use of a uniform quantizer of the midtread type. 

Let the quantizer input m be the sample value of a zero-mean random variable M. (If 
the input has a nonzero mean, we can always remove it by subtracting the mean from the 
input and then adding it back after quantization.) A quantizer, denoted by g(-), maps the 
input random variable M of continuous amplitude into a discrete random variable V; their 
respective sample values m and v are related by the nonlinear function g(-) in (6.22). Let 
the quantization error be denoted by the random variable Q of sample value q. We may 
thus write 

q = m-v 


or, correspondingly. 


Q = M-V 
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Illustration of the 
quantization process. 



With the input M having zero mean and the quantizer assumed to be symmetric as in 
Figure 6.9, it follows that the quantizer output V and, therefore, the quantization error Q 
will also have zero mean. Thus, for a partial statistical characterization of the quantizer in 
terms of output signal-to-(quantization) noise ratio, we need only find the mean-square 
value of the quantization error Q. 

Consider, then, an input m of continuous amplitude, which, symmetrically, occupies the 
range [-m max , w max ]. Assuming a uniform quantizer of the midrise type illustrated in 
Figure 6.9b, we find that the step size of the quantizer is given by 


L 

where L is the total number of representation levels. For a uniform quantizer, the 
quantization error Q will have its sample values bounded by -A/2 < q < A/2. If the step size 
A is sufficiently small (i.e., the number of representation levels L is sufficiently large), it is 
reasonable to assume that the quantization error Q is a uniformly distributed random 
variable and the interfering effect of the quantization error on the quantizer input is similar 
to that of thermal noise, hence the reference to quantization error as quantization noise. 
We may thus express the probability density function of the quantization noise as 


f Q (q) = 


l 

A’ 

0 , 



otherwise 


For this to be true, however, we must ensure that the incoming continuous sample does not 
overload the quantizer. Then, with the mean of the quantization noise being zero, its 
variance cj~q is the same as the mean-square value; that is, 
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4 = E tG 2 ] 


.A/2 


J q 2 f Q (q) d q 


-A/2 


Substituting (6.26) into (6.27), we get 


= 1 r A/2 
AJ 4 


q 2 d? 


-A/2 

12 

Typically, the L-ary number A:, denoting the Ath representation level of the quantizer, is 
transmitted to the receiver in binary form. Let R denote the number of bits per sample used 
in the construction of the binary code. We may then write 


L = 2 


or, equivalently, 


R = log-, L 

Hence, substituting (6.29) into (6.25), we get the step size 

2/77 „ 


Thus, the use of (6.31) in (6.28) yields 


2 1 2 _-2 R 

= 3 W max 2 


Let P denote the average power of the original message signal m(t). We may then express 
the output signal-to-noise ratio of a uniform quantizer as 

(SNR) 0 = 4 


3P 


V//; 


,2 R 


Equation (6.33) shows that the output signal-to-noise ratio of a uniform quantizer (SNR)q 
increases exponentially with increasing number of bits per sample R, which is intuitively 
satisfying. 


Sinusoidal Modulating Signal 

Consider the special case of a full-load sinusoidal modulating signal of amplitude A m , 
which utilizes all the representation levels provided. The average signal power is 
(assuming a load of 1 Q) 


2 
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The total range of the quantizer input is 2A m , because the modulating signal swings 
between -A m and A m . We may, therefore, set m max = A m , in which case the use of (6.32) 
yields the average power (variance) of the quantization noise as 



2 R 


Thus, the output signal-to-noise of a uniform quantizer, for a full-load test tone, is 
(SNR) 0 


A 2 / 2 




|(2 2S ) 


Expressing the signal-to-noise (SNR) in decibels, we get 


10 log 1() (SNR) 0 = 1.8 + 6 R 

The corresponding values of signal-to-noise ratio for various values of L and R , are given in 
Table 6. 1 . For sinusoidal modulation, this table provides a basis for making a quick estimate 
of the number of bits per sample required for a desired output signal-to-noise ratio. 


Signal-to-(quantization) noise ratio for varying number of 
representation levels for sinusoidal modulation 


32 

5 

31.8 

64 

6 

37.8 

128 

7 

43.8 

256 

8 

49.8 


In designing a scalar quantizer, the challenge is how to select the representation levels and 
surrounding partition cells so as to minimize the average quantization power for a fixed 
number of representation levels. 

To state the problem in mathematical terms: consider a message signal m(t ) drawn from 
a stationary process and whose dynamic range, denoted by -A < m < A, is partitioned into 
a set of L cells, as depicted in Figure 6.1 1. The boundaries of the partition cells are defined 
by a set of real numbers otj, ..., m L _ j that satisfy the following three conditions: 

m j = -A 
m L- 1 = A 

m k < m, j for k = 1,2, L 


Illustrating the partitioning of the dynamic range '"1 ~ A '"2 m 3 m i. - 1 m L m L + 1 - +A 

-A < m < A of a message signal m(t ) into a set of L cells. b 2 A » 
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The ktb partition cell is defined by (6.20), reproduced here for convenience: 

J k : m k < m < m k _ j for k = 1,2, L 

Let the representation levels (i.e., quantization values) be denoted by v k , k = 1, 2, .... L. 
Then, assuming that d(m,v k ) denotes a distortion measure for using v k to represent all 
those values of the input m that lie inside the partition cell J k , the goal is to find the two 
sets {v k } k _ j and {J k } k _ j that minimize the average distortion 

L . 

D = V f d(m,v k )f M (m ) dm 

J m g v h 

k = i 

where fi/jri) is the probability density function of the random variable M with sample 
value m. 

A commonly used distortion measure is defined by 

2 

d(m,v k ) = ( m - v k ) 

in which case we speak of the mean-square distortion. In any event, the optimization problem 
stated herein is nonlinear, defying an explicit, closed-form solution. To get around this diffi- 
culty, we resort to an algorithmic approach for solving the problem in an iterative manner. 

Structurally speaking, the quantizer consists of two components with interrelated 
design parameters: 

• An encoder characterized by the set of partition cells {J k } L k _ p this is located in the 
transmitter. 

• A decoder characterized by the set of representation levels { v ^ ^ _ ; this is located 
in the receiver. 

Accordingly, we may identify two critically important conditions that provide the 
mathematical basis for all algorithmic solutions to the optimum quantization problem. 
One condition assumes that we are given a decoder and the problem is to find the optimum 
encoder in the transmitter. The other condition assumes that we are given an encoder and 
the problem is to find the optimum decoder in the receiver. Henceforth, these two 
conditions are referred to as condition I and II, respectively. 

Optimality of the Encoder for a Given Decoder 

The availability of a decoder means that we have a certain codebook in mind. Let the 
codebook be defined by 

Given the codebook %, the problem is to find the set of partition cells { J k } k _ l that 
minimizes the mean-square distortion D. That is, we wish to find the encoder defined by 
the nonlinear mapping 

g(m) = v k , k = 1, 2 

such that we have 

r A L 

D = J d(m, g(m))f M (m) d M> ^ J m £ y [mm d(m, v k )]f M (m) dm 

- A k = 1 * 
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For the lower bound specified in (6.41) to be attained, we require that the nonlinear 
mapping of (6.40) be satisfied only if the condition 

d(m, v k ) < d(m, vj) holds for all j # k 

The necessary condition described in (6.42) for optimality of the encoder for a specified 
codebook % is recognized as the nearest-neighbor condition. In words, the nearest 
neighbor condition requires that the partition cell J k should embody all those values of the 
input m that are closer to v k than any other element of the codebook T. This optimality 
condition is indeed intuitively satisfying. 

Optimality of the Decoder for a Given Encoder 

Consider next the reverse situation to that described under condition I, which may be 
stated as follows: optimize the codebook %= { v k } L k _ j for the decoder, given that the 
set of partition cells {Ji c ) k _ j characterizing the encoder is fixed. The criterion for 
optimization is the average (mean-square) distortion: 

D = 

k = i 

The probability density function f^{m) is clearly independent of the codebook %. Hence, 
differentiating D with respect to the representation level v k , we readily obtain 

fr = - 2 £l m .jJ m - v M m) dm 

k k = 1 

Setting dD/dv k equal to zero and then solving for v k , we obtain the optimum value 

_LsJ k m fM( m ) dm 

V k, opt j. 

The denominator in (6.45) is just the probability p k that the random variable M with 
sample value m lies in the partition cell J k , as shown by 

p k = P(m k < M < m k + l) 

= dm 

Accordingly, we may interpret the optimality condition of (6.45) as choosing the 
representation level v k to equal the conditional mean of the random variable M, given that 
M lies in the partition cell J k . We can thus formally state that the condition for optimality 
of the decoder for a given encoder as follows: 

v fc,opt = E[ M\m k <M<m k+l \ 

where E is the expectation operator. Equation (6.47) is also intuitively satisfying. 

Note that the nearest neighbor condition (I) for optimality of the encoder for a given 
decoder was proved for a generic average distortion. However, the conditional mean 
requirement (condition II) for optimality of the decoder for a given encoder was proved for 
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the special case of a mean-square distortion. In any event, these two conditions are 
necessary for optimality of a scalar quantizer. Basically, the algorithm for designing the 
quantizer consists of alternately optimizing the encoder in accordance with condition I, 
then optimizing the decoder in accordance with condition II, and continuing in this 
manner until the average distortion D reaches a minimum. The optimum quantizer 
designed in this manner is called the Lloyd-Max quantizer. 

Pulse-Code Modulation 


With the material on sampling, PAM, and quantization presented in the preceding 
sections, the stage is set for describing PCM, for which we offer the following definition: 


Specifically, the transmitter consists of two components: a pulse-amplitude modulator followed 
by an analog-to-digital (A/D) converter. The latter component itself embodies a quantizer 
followed by an encoder. The receiver performs the inverse of these two operations: digital-to- 
analog (D/A) conversion followed by pulse-amplitude demodulation. The communication 
channel is responsible for transporting the encoded pulses from the transmitter to the receiver. 

Figure 6.12, a block diagram of the PCM, shows the transmitter, the transmission path 
from the transmitter output to the receiver input, and the receiver. 

It is important to realize, however, that once distortion in the form of quantization noise 
is introduced into the encoded pulses, there is absolutely nothing that can be done at the 
receiver to compensate for that distortion. The only design precaution that can be taken is 
to choose a number of representation levels in the receiver that is large enough to ensure 
that the quantization noise is imperceptible for human use at the receiver output. 



Digitally encoded message signal 
across the transmission path 

Block diagram of PCM system. 
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The incoming message signal is sampled with a train of rectangular pulses short enough to 
closely approximate the instantaneous sampling process. To ensure perfect reconstruction of 
the message signal at the receiver, the sampling rate must be greater than twice the highest 
frequency component W of the message signal in accordance with the sampling theorem. In 
practice, a low-pass anti-aliasing filter is used at the front end of the pulse-amplitude 
modulator to exclude frequencies greater than W before sampling and which are of 
negligible practical importance. Thus, the application of sampling permits the reduction of 
the continuously varying message signal to a limited number of discrete values per second. 


The PAM representation of the message signal is then quantized in the analog-to-digital 
converter, thereby providing a new representation of the signal that is discrete in both time 
and amplitude. The quantization process may follow a uniform law as described in Section 
6.4. In telephonic communication, however, it is preferable to use a variable separation 
between the representation levels for efficient utilization of the communication channel. 
Consider, for example, the quantization of voice signals. Typically, we find that the range 
of voltages covered by voice signals, from the peaks of loud talk to the weak passages of 
weak talk, is on the order of 1000 to 1. By using a nonuniform quantizer with the feature 
that the step size increases as the separation from the origin of the input-output amplitude 
characteristic of the quantizer is increased, the large end-steps of the quantizer can take 
care of possible excursions of the voice signal into the large amplitude ranges that occur 
relatively infrequently. In other words, the weak passages needing more protection are 
favored at the expense of the loud passages. In this way, a nearly uniform percentage 
precision is achieved throughout the greater part of the amplitude range of the input signal. 
The end result is that fewer steps are needed than would be the case if a uniform quantizer 
were used; hence the improvement in channel utilization. 

Assuming memoryless quantization, the use of a nonuniform quantizer is equivalent to 
passing the message signal through a compressor and then applying the compressed signal 
to a uniform quantizer , as illustrated in Figure 6.13a. A particular form of compression law 
that is used in practice is the so-called ju-law , which is defined by 

i v i = ln( 1 + fj\m\) 

1 1 ln( 1 + n) 

where In, i.e., log e , denotes the natural logarithm, m and v are the input and output 
voltages of the compressor , and /u is a positive constant. It is assumed that m and. 


nput message 
signal 
m(t) 






Compressor 



Uniform 

quantizer 


Compressed output 
signal 


(a) Nonuniform quantization 
of the message signal in the 
transmitter, (b) Uniform 
quantization of the original 
message signal in the receiver. 


(a) 


Compressed 

signal 






Expander 



Uniformly quantized 
version of the original 
message signal m(t) 


(b) 
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Compression laws: 

(a) /(-law; 

(b) A-law. 



therefore, v are scaled so that they both lie inside the interval [-1, 1], The /u - law is plotted 
for three different values of a in Figure 6.14a. The case of uniform quantization 
corresponds to // - 0. For a given value of //, the reciprocal slope of the compression curve 
that defines the quantum steps is given by the derivative of the absolute value \m\ with 
respect to the corresponding absolute value |v|; that is, 

d | v | M 

From (6.49) it is apparent that the /(-law is neither strictly linear nor strictly logarithmic. 
Rather, it is approximately linear at low input levels corresponding to /j\m\ « 1 and 
approximately logarithmic at high input levels corresponding to /j\m \ »1. 

Another compression law that is used in practice is the so-called A-law, defined by 


v = 


A\m\ 

1 + In A’ 

1 + ln(A|???| 
1 + InA 


0 < \m\ < - 

r l 


— < \m\ < 1 

/l 


where A is another positive constant. Equation (6.50) is plotted in Figure 6.14b for varying 
A. The case of uniform quantization corresponds to A = 1. The reciprocal slope of this 
second compression curve is given by the derivative of \m\ with respect to | v |, as shown by 


d | 7)7 | 


1 + InA 
A 


0 < 1 777 1 < — 
A 


— < 1 777 1 < 1 


( 1 + lnA)|)w| , 
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To restore the signal samples to their correct relative level, we must, of course, use a device 
in the receiver with a characteristic complementary to the compressor. Such a device is 
called an expander. Ideally, the compression and expansion laws are exactly the inverse of 
each other. With this provision in place, we find that, except for the effect of quantization, the 
expander output is equal to the compressor input. The cascade combination of a compressor 
and an expander, depicted in Figure 6.13, is called a compander. 

For both the p - law and A-law, the dynamic range capability of the compander improves 
with increasing p and A, respectively. The SNR for low-level signals increases at the expense 
of the SNR for high-level signals. To accommodate these two conflicting requirements (i.e., 
a reasonable SNR for both low- and high-level signals), a compromise is usually made in 
choosing the value of parameter p for the //-law and parameter A for the A-law. The typical 
values used in practice are p = 255 for the //-law and A = 87.6 for the A-law. 


Through the combined use of sampling and quantization, the specification of an analog 
message signal becomes limited to a discrete set of values, but not in the form best suited 
to transmission over a telephone line or radio link. To exploit the advantages of sampling 
and quantizing for the purpose of making the transmitted signal more robust to noise, 
interference, and other channel impairments, we require the use of an encoding process to 
translate the discrete set of sample values to a more appropriate form of signal. Any plan 
for representing each of this discrete set of values as a particular arrangement of discrete 
events constitutes a code. Table 6.2 describes the one-to-one correspondence between 
representation levels and codewords for a binary number system for R = 4 bits per sample. 
Following the terminology of Chapter 5, the two symbols of a binary code are customarily 
denoted as 0 and 1 . In practice, the binary code is the preferred choice for encoding for the 
following reason: 


The last signal-processing operation in the transmitter is that of line coding, the purpose of 
which is to represent each binary codeword by a sequence of pulses; for example, 
symbol 1 is represented by the presence of a pulse and symbol 0 is represented by absence 
of the pulse. Line codes are discussed in Section 6.10. Suppose that, in a binary code, each 

D 

codeword consists of R bits. Then, using such a code, we may represent a total of 2 
distinct numbers. For example, a sample quantized into one of 256 levels may be 
represented by an 8-bit codeword. 


The first operation in the receiver of a PCM system is to regenerate (i.e., reshape and clean 
up) the received pulses. These clean pulses are then regrouped into codewords and decoded 
(i.e., mapped back) into a quantized pulse-amplitude modulated signal. The decoding 
process involves generating a pulse the amplitude of which is the linear sum of all the pulses 
in the codeword. Each pulse is weighted by its place value (2 , 2 , 2 , ..., 2 ) in the code, 

where R is the number of bits per sample. Note, however, that whereas the analog-to-digital 
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Binary number system for T= 4 bits/sample 
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converter in the transmitter involves both quantization and encoding, the digital-to-analog 
converter in the receiver involves decoding only, as illustrated in Figure 6.12. 

The final operation in the receiver is that of signal reconstruction. Specifically, an 
estimate of the original message signal is produced by passing the decoder output through 
a low-pass reconstruction filter whose cutoff frequency is equal to the message 
bandwidth W. Assuming that the transmission link (connecting the receiver to the 
transmitter) is error free, the reconstructed message signal includes no noise with the 
exception of the initial distortion introduced by the quantization process. 


The most important feature of a PCM systems is its ability to control the effects of 
distortion and noise produced by transmitting a PCM signal through the channel, 
connecting the receiver to the transmitter. This capability is accomplished by 
reconstructing the PCM signal through a chain of regenerative repeaters, located at 
sufficiently close spacing along the transmission path. 
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Block diagram of 
regenerative repeater. 


Distorted 
PCM wave 



Regenerated 
PCM wave 


As illustrated in Figure 6.15, three basic functions are performed in a regenerative 
repeater: equalization, timing, and decision making. The equalizer shapes the received 
pulses so as to compensate for the effects of amplitude and phase distortions produced by 
the non-ideal transmission characteristics of the channel. The timing circuitry provides a 
periodic pulse train, derived from the received pulses, for sampling the equalized pulses at 
the instants of time where the SNR ratio is a maximum. Each sample so extracted is com- 
pared with a predetermined threshold in the decision-making device. In each bit interval, a 
decision is then made on whether the received symbol is 1 or 0 by observing whether the 
threshold is exceeded or not. If the threshold is exceeded, a clean new pulse representing 
symbol 1 is transmitted to the next repeater; otherwise, another clean new pulse represent- 
ing symbol 0 is transmitted. In this way, it is possible for the accumulation of distortion and 
noise in a repeater span to be almost completely removed, provided that the disturbance is 
not too large to cause an error in the decision-making process. Ideally, except for delay, the 
regenerated signal is exactly the same as the signal originally transmitted. In practice, how- 
ever, the regenerated signal departs from the original signal for two main reasons: 

The unavoidable presence of channel noise and interference causes the repeater to 
make wrong decisions occasionally, thereby introducing bit errors into the 
regenerated signal. 

If the spacing between received pulses deviates from its assigned value, a jitter is 
introduced into the regenerated pulse position, thereby causing distortion. 

The important point to take from this subsection on PCM is the fact that regeneration 
along the transmission path is provided across the spacing between individual regenerative 
repeaters (including the last stage of regeneration at the receiver input) provided that the 
spacing is short enough. If the transmitted SNR ratio is high enough, then the regenerated 
PCM data stream is the same as the transmitted PCM data stream, except for a practically 
negligibly small bit error rate (BER). In other words, under these operating conditions, 
performance degradation in the PCM system is essentially confined to quantization noise 
in the transmitter. 

Noise Considerations in PCM Systems 


The performance of a PCM system is influenced by two major sources of noise: 

Channel noise, which is introduced anywhere between the transmitter output and the 
receiver input; channel noise is always present, once the equipment is switched on. 
Quantization noise, which is introduced in the transmitter and is carried all the way 
along to the receiver output; unlike channel noise, quantization noise is signal 
dependent, in the sense that it disappears when the message signal is switched off. 
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Naturally, these two sources of noise appear simultaneously once the PCM system is in 
operation. However, the traditional practice is to consider them separately, so that we may 
develop insight into their individual effects on the system performance. 

The main effect of channel noise is to introduce bit errors into the received signal. In 
the case of a binary PCM system, the presence of a bit error causes symbol 1 to be 
mistaken for symbol 0, or vice versa. Clearly, the more frequently bit errors occur, the 
more dissimilar the receiver output becomes compared with the original message signal. 
The fidelity of information transmission by PCM in the presence of channel noise may be 
measured in terms of the average probability of symbol error, which is defined as the 
probability that the reconstructed symbol at the receiver output differs from the 
transmitted binary symbol on the average. The average probability of symbol error, also 
referred to as the BER, assumes that all the bits in the original binary wave are of equal 
importance. When, however, there is more interest in restructuring the analog waveform of 
the original message signal, different symbol errors may be weighted differently; for 
example, an error in the most significant bit in a codeword (representing a quantized 
sample of the message signal) is more harmful than an error in the least significant bit. 

To optimize system performance in the presence of channel noise, we need to minimize 
the average probability of symbol error. For this evaluation, it is customary to model the 
channel noise as an ideal additive white Gaussian noise (AWGN) channel. The effect of 
channel noise can be made practically negligible by using an adequate signal energy-to- 
noise density ratio through the provision of short-enough spacing between the regenerative 
repeaters in the PCM system. In such a situation, the performance of the PCM system is 
essentially limited by quantization noise acting alone. 

From the discussion of quantization noise presented in Section 6.4, we recognize that 
quantization noise is essentially under the designer’s control. It can be made negligibly 
small through the use of an adequate number of representation levels in the quantizer and 
the selection of a companding strategy matched to the characteristics of the type of 
message signal being transmitted. We thus find that the use of PCM offers the possibility 
of building a communication system that is rugged with respect to channel noise on a scale 
that is beyond the capability of any analog communication system; hence its use as a 
standard against which other waveform coders (e.g., DPCM and DM) are compared. 


The underlying theory of BER calculation in a PCM system is deferred to Chapter 8. For 
the present, it suffices to say that the average probability of symbol error in a binary 
encoded PCM receiver due to AWGN depends solely on E b /N 0 , which is defined as the 
ratio of the transmitted signal energy per bit E b , to the noise spectral density Nq. Note that 
the ratio E b /N 0 is dimensionless even though the quantities E b and N 0 have different 
physical meaning. In Table 6.3, we present a summary of this dependence for the case of a 
binary PCM system, in which symbols 1 and 0 are represented by rectangular pulses of 
equal but opposite amplitudes. The results presented in the last column of the table assume 
a bit rate of 10 5 bits/s. 

From Table 6.3 it is clear that there is an error threshold (at about 1 1 dB). For E b /N 0 
below the error threshold the receiver performance involves significant numbers of errors, 
and above it the effect of channel noise is practically negligible. In other words, provided 
that the ratio E b /N 0 exceeds the error threshold, channel noise has virtually no effect on 
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Influence of E b l /V 0 on the probability of error 
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the receiver performance, which is precisely the goal of PCM. When, however, E b /N 0 
drops below the error threshold, there is a sharp increase in the rate at which errors occur 
in the receiver. Because decision errors result in the construction of incorrect codewords, 
we find that when the errors are frequent, the reconstructed message at the receiver output 
bears little resemblance to the original message signal. 

An important characteristic of a PCM system is its ruggedness to interference, caused 
by impulsive noise or cross-channel interference. The combined presence of channel noise 
and interference causes the error threshold necessary for satisfactory operation of the PCM 
system to increase. If, however, an adequate margin over the error threshold is provided in 
the first place, the system can withstand the presence of relatively large amounts of 
interference. In other words, a PCM system is robust with respect to channel noise and 
interference, providing further confirmation to the point made in the previous section that 
performance degradation in PCM is essentially confined to quantization noise in the 
transmitter. 


Consider now a PCM system that is known to operate above the error threshold, in which 
case we would be justified to ignore the effect of channel noise. In other words, the noise 
performance of the PCM system is essentially determined by quantization noise acting 
alone. Given such a scenario, how does the PCM system fare compared with the 
information capacity law, derived in Chapter 5? 

To address this question of practical importance, suppose that the system uses a 
codeword consisting of n symbols with each symbol representing one of M possible 
discrete amplitude levels; hence the reference to the system as an “M- ary” PCM system. 
For this system to operate above the error threshold, there must be provision for a large 
enough noise margin. 

For the PCM system to operate above the error threshold as proposed, the requirement 
for a noise margin that is sufficiently large to maintain a negligible error rate due to 
channel noise. This, in turn, means there must be a certain separation between the M 
discrete amplitude levels. Call this separation ccr, where c is a constant and a = N 0 B is the 
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noise variance measured in a channel bandwidth B. The number of amplitude levels M is 
usually an integer power of 2. The average transmitted power will be least if the amplitude 
range is symmetrical about zero. Then, the discrete amplitude levels, normalized with 
respect to the separation ca, will have the values ±1/2, ±3/2, ..., ±(M - l)/2. We assume 
that these M different amplitude levels are equally likely. Accordingly, we find that the 
average transmitted power is given by 
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M 
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M- 1 


2-i 
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2 2 , 
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M 2 - P j 
. 12 ) 


Suppose that the M-ary PCM system described herein is used to transmit a message signal 
with its highest frequency component equal to W hertz. The signal is sampled at the 
Nyquist rate of 2 W samples per second. We assume that the system uses a quantizer of the 
midrise type, with L equally likely representation levels. Hence, the probability of 
occurrence of any one of the L representation levels is 1/L. Correspondingly, the amount 
of information carried by a single sample of the signal is log 2 L bits. With a maximum 
sampling rate of 2 W samples per second, the maximum rate of information transmission of 
the PCM system measured in bits per second is given by 

R b = 2W log 2 L bits/s 

Since the PCM system uses a codeword consisting of n code elements with each one 
having M possible discrete amplitude values, we have M" different possible codewords. 
For a unique encoding process, therefore, we require 


L = M n 

Clearly, the rate of information transmission in the system is unaffected by the use of an 
encoding process. We may, therefore, eliminate L between (6.53) and (6.54) to obtain 

R b = 2 Wn log 2 -M bits/s 


Equation (6.52) defines the average transmitted power required to maintain an M-ary PCM 
system operating above the error threshold. Hence, solving this equation for the number of 
discrete amplitude levels, we may express the number M in terms of the average 
transmitted power P and channel noise variance a~ = NqB as follows: 


M = 


1 + 


12 P 
c 2 N n B 


1/2 


Therefore, substituting (6.56) into (6.55), we obtain 


R b = Wn log. 


1? P 

1 + _i££L 

“V c 2 N 0 bJ 


The channel bandwidth B required to transmit a rectangular pulse of duration 1/(2 nW), 
representing a symbol in the codeword, is given by 


B = KnW 
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where k is a constant with a value lying between 1 and 2. Using the minimum possible 
value k= 1, we find that the channel bandwidth B = nW. We may thus rewrite (6.57) as 


which defines the upper bound on the information capacity realizable by an M-ary PCM 
system. 

From Chapter 5 we recall that, in accordance with Shannon’s information capacity law, 
the ideal transmission system is described by the formula 


The most interesting point derived from the comparison of (6.59) with (6.60) is the fact 
that (6.59) is of the right mathematical form in an information-theoretic context. To be 
more specific, we make the following statement: 


As a corollary, we may go on to state: 


From the study of noise in analog modulation systems, it is known that the use of 
frequency modulation provides the best improvement in SNR ratio. To be specific, when 
the carrier-to-noise ratio is high enough, the bandwidth-noise trade-off follows a square 
law in frequency modulation (FM). Accordingly, in comparing the noise performance of 
FM with that of PCM we make the concluding statement: 


Indeed, this statement is further testimony for the PCM being viewed as a standard for 
waveform coding. 

Prediction-Error Filtering for Redundancy Reduction 


When a voice or video signal is sampled at a rate slightly higher than the Nyquist rate, as 
usually done in PCM, the resulting sampled signal is found to exhibit a high degree of 
correlation between adjacent samples. The meaning of this high correlation is that, in an 
average sense, the signal does not change rapidly from one sample to the next. As a result, 
the difference between adjacent samples has a variance that is smaller than the variance of 
the original signal. When these highly correlated samples are encoded, as in the standard 
PCM system, the resulting encoded signal contains redundant information. This kind of 
signal structure means that symbols that are not absolutely essential to the transmission of 
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information are generated as a result of the conventional encoding process described in 
Section 6.5. By reducing this redundancy before encoding, we obtain a more efficient coded 
signal, which is the basic idea behind DPCM. Discussion of this latter form of waveform 
coding is deferred to the next section. In this section we discuss prediction-error filtering, 
which provides a method for reduction and, therefore, improved waveform coding. 

To elaborate, consider the block diagram of Figure 6.16a, which includes: 

• a direct forward path from the input to the output; 

• a predictor in the forward direction as well; and 

• a comparator for computing the difference between the input signal and the 
predictor output. 

The difference signal, so computed, is called the prediction error. Correspondingly, a filter 
that operates on the message signal to produce the prediction error, illustrated in Figure 
6.16a, is called a prediction-error filter. 

To simplify the presentation, let 


denote a sample of the message signal m(t) taken at time t = nT s . Then, with m n denoting 
the corresponding predictor output, the prediction error is defined by 

P — 111 — III 

c n rn n rn n 

where e n is the amount by which the predictor fails to predict the input sample m n exactly. 
In any case, the objective is to design the predictor so as to minimize the variance of the 
prediction error e n . In so doing, we effectively end up using a smaller number of bits to 
represent e n than the original message sample m n \ hence, the need for a smaller 
transmission bandwidth. 


m n = m(nT s ) 


Message signal 



m(nT s ) - m n 


Prediction error 




Sample 

every 

r,. Seconds 


o >- 



Predictor 


(a) 


Predict' 



Sampled 


(b) 

Block diagram of (a) prediction-error filter and (b) its inverse. 
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The prediction-error filter operates on the message signal on a sample-by-sample basis 
to produce the prediction error. With such an operation performed in the transmitter, how 
do we recover the original message signal from the prediction error at the receiver? To 
address this fundamental question in a simple-minded and yet practical way, we invoke the 
use of linerarity. Let the operator L denote the action of the predictor, as shown by 

m n = L [mj 

Accordingly, we may rewrite (6.62) in operator form as follows: 

= m n -L[m n \ 

= (1-L )[m„] 

Under the assumption of linearity, we may invert (6.64) to recover the message sample 
from the prediction error, as shown by 



Equation (6.65) is immediately recognized as the equation of a feedback system , as 
illustrated in Figure 6.16b. Most importantly, in functional terms, this feedback system 
may be viewed as the inverse of prediction-error filtering. 


To simplify the design of the linear predictor in Figure 6. 16, we propose to use a discrete-time 
structure in the form of a finite-duration impulse response (FIR) filter, which is well known in 
the digital signal-processing literature. The FIR filter was briefly discussed in Chapter 2. 
Figure 6.17 depicts an FIR filter, consisting of two functional components: 

• a set of p unit-delay elements, each of which is represented by z , and 

• a corresponding set of adders used to sum the scaled versions of the delayed inputs, 

m n - 1’ m n - 2’ ■ m n -p- 

The overall linearly predicted output is thus defined by the convolution sum 

P 

m n = X W k m »-k 
k = 1 

where p is called the prediction order. Minimization of the prediction-error variance is 
achieved by a proper choice of the FIR filter-coefficients as described next. 


Message 

sample 



Block diagram of an FIR filter of order p. 


Prediction of 
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First, however, we make the following assumption: 


This assumption may be satisfied by processing the message signal on a block-by-block 
basis, with each block being just long enough to satisfy the assumption in a pseudo- 
stationary manner. For example, a block duration of 40 ms is considered to be adequate 
for voice signals. 

With the random variable M n assumed to have zero mean, it follows that the variance of 
the prediction error e n is the same as its mean-square value. We may thus define 

J = E [e 2 (n)] 

as the index of performance. Substituting (6.65) and (6.66) into (6.67) and then expanding 
terms, the index of performance is expressed as follows: 

/(w) = E [ml] - 2 £ w k E[m n m n _ k ] + X X w j w k^ m n-j m n-k\ 
k= 1 j = 1 k = 1 

Moreover, under the above assumption of pseudo-stationarity, we may go on to introduce 
the following second-order statistical parameters for m n treated as a sample of the 
stochastic process M(t) at t = nT s : 

Variance 

a M = E[(m„-E [m n ]f] 

= E [m" n \ for E [m n ] = 0 

Autocorrelation function 

R M,k-j = E K -j m n-k\ 

Note that to simplify the notation in (6.67) to (6.70), we have applied the expectation 
operator E to samples rather than the corresponding random variables. 

In any event, using (6.69) and (6.70), we may reformulate the index of performance of 
(6.68) in the new form involving statistical parameters: 

2 p P P 

J ( W ) = a M - 2 X W k R M. k + X Z W J W k R M, k -j 

k= 1 j = 1 k = 1 

Differentiating this index of performance with respect to the filter coefficients, setting the 
resulting expression equal to zero, and then rearranging terms, we obtain the following 
system of simultaneous equations: 

P 

X W o,j R M,k-j ~ R M ,k> k ~ ■■■,P 

7=1 

where w 0 ,j is the optimal value of the /th filter coefficient wj. This optimal set of equations 
is the discrete-time version of the celebrated Wiener-Hopf equations for linear prediction. 
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With compactness of mathematical exposition in mind, we find it convenient to 
formulate the Wiener-Hopf equations in matrix form, as shown by 

R M W o = r M 

where 

w o = K,1 ’ w o,2’ w o , P ] T 

is the p- by-1 optimum coefficient vector of the FIR predictor, 

T 

r M = t R M, V R M, 2’ R M,p] 

is the p- by- 1 autocorrelation vector of the original message signal, excluding the mean- 
square value represented by R M q, and 



r m,o 

r m, 1 • 

■ R M, p-\ 

r m ~ 

r m, 1 

R M, 0 ■ 

■ ^M,p-2 


R M,p-l 

R M,p-2 ■ 

• r m, 0 _ 


is the p-by-y correlation matrix of the original message signal, including R M (l . 

Careful examination of (6.76) reveals the Toeplitz property of the autocorrelation 
matrix R M , which embodies two distinctive characteristics: 

All the elements on the main diagonal of the matrix R M are equal to the mean- 
square value or, equivalently under the zero-mean assumption, the variance of the 
message sample m n , as shown by 

*M«>) = °M 

The matrix is symmetric about the main diagonal. 

This Toeplitz property is a direct consequence of the assumption that message signal m(t ) 
is the sample function of a stationary stochastic process. From a practical perspective, the 
Toeplitz property of the autocorrelation matrix R M is important in that all of its elements 
are uniquely defined by the autocorrelation sequence {R M / c }^_ Q . Moreover, from the 
defining equation (6.75), it is clear that the autocorrelation vector r M is uniquely defined 
by the autocorrelation sequence {R M ^ ^ ^ . We may therefore make the following 

statement: 


Typically, we have 


| r m,*| for k = 1 , 2, ...,p 
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Under this condition, we find that the autocorrelation matrix R M is also invertible; that is, 
the inverse matrix R M exists. We may therefore solve (6.73) for the unknown value of the 
optimal coefficient vector w G using the formula 

W o = R M r M 

Thus, given the variance crj^ and autocorrelation sequence {R M ^ ^ _ j, we may uniquely 
determine the optimized coefficient vector of the linear predictor, w 0 , defining an FIR 
filter of order p: and with it our design objective is satisfied. 

To complete the linear prediction theory presented herein, we need to find the 
minimum mean-square value of prediction error, resulting from the use of the optimized 
predictor. We do this by first reformulating (6.7 1 ) in the matrix form: 

/(w 0 ) = °m — 2Wp r M + wj R M W 0 

T 

where the superscript T denotes matrix transposition, w Q r M is the inner product of the 
p-by-l vectors w 0 and r M , and the matrix product w 0 R M w is a quadratic form. Then, 
substituting the optimum formula of (6.77) into (6.78), we find that the minimum mean- 
square value of prediction error is given by 


2( R M r M) r M + ( R M r M) R M( R M r M) 

2 . T _-l T .--1 

~ “ r M R M r M + r M R M r M 

2 T R 1 

r M K M r M 


where we have used the property that the autocorrelation matrix of a weakly stationary 
process is symmetric, that is, 

r m = r m 

T -1 

By definition, the quadratic form r M R M r M is always positive. Accordingly, from (6.79) 
it follows that the minimum value of the mean-square prediction error J min is always 
smaller than the variance crj^ of the zero-mean message sample m n that is being 
predicted. Through the use of linear prediction as described herein, we have thus satisfied 
the objective: 


This statement provides the rationale for going on to describe how the bandwidth 
requirement of the standard PCM can be reduced through redundancy reduction. However, 
before proceeding to do so, it is instructive that we consider an adaptive implementation of 
the linear predictor. 
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The use of (6.77) for calculating the optimum weight vector of a linear predictor requires 
knowledge of the autocorrelation function R m k of the message signal sequence { m k } P k _ 0 
where p is the prediction order. What if knowledge of this sequence is not available? In 
situations of this kind, which occur frequently in practice, we may resort to the use of an 
adaptive predictor. 

The predictor is said to be adaptive in the following sense: 

• Computation of the tap weights w k , k = 1,2, .... p. proceeds in an iterative manner, 
starting from some arbitrary initial values of the tap weights. 

• The algorithm used to adjust the tap weights (from one iteration to the next) is “self- 
designed,” operating solely on the basis of available data. 

The aim of the algorithm is to find the minimum point of the bowl-shaped error surface 
that describes the dependence of the cost function J on the tap weights. It is, therefore, 
intuitively reasonable that successive adjustments to the tap weights of the predictor be 
made in the direction of the steepest descent of the error surface; that is, in a direction 
opposite to the gradient vector whose elements are defined by 



k= 1, 2 ,...,p 


This is indeed the idea behind the method of deepest descen t. Let w k n denote the value of 
the kth tap weight at iteration n. Then, the updated value of this weight at iteration n + 1 is 
defined by 

w k,n + l = w k,n~\^Sk, k=\,2,...,p 


where p is a step-size parameter that controls the speed of adaptation and the factor 1/2 is 
included for convenience of presentation. Differentiating the cost function J of (6.68) with 
respect to w k , we readily find that 


8k = 


P 


-2E [m n m n _ k \ + £ w j E[m n _ j m n _ k \ 
7=1 


From a practical perspective, the formula for the gradient g k in (6.83) could do with further 
simplification that ignores the expectation operator. In effect, instantaneous values are 
used as estimates of autocorrelation functions. The motivation for this simplification is to 
permit the adaptive process to proceed forward on a step-by-step basis in a self-organized 
manner. Clearly, by ignoring the expectation operator in (6.83), the gradient g k takes on a 
time-dependent value, denoted by g k n . We may thus write 

P „ 

8k,n = - 2m n m n-k + 2m n-k X W j,n m n,j’ ^=1,2, ...,p 

7=1 

where w . n is an estimate of the filter coefficient Wj n at time n. 

The stage is now set for substituting (6.84) into (6.82), where in the latter equation w k 
is substituted for w k J( ; this change is made to account for dispensing with the expectation 
operator: 
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™k,n + 1 = ™k,n-\w K „ 


( 


P 





f 

P 



J, n n -j 


/ 



where e n is the new prediction error defined by 



P 


J, n n -j 


Note that the current value of the message signal, m n , plays a role as the desired response 
for predicting the value of m n given the past values of the message signal: m n _ j, m n _ 2 , 
. . m n _ p. 

In words, we may express the adaptive filtering algorithm of (6.85) as follows: 

/ Updated value of the kth \_/ Old value of the same \ / Step-size \ / Message signal m n \/ Prediction error \ 
\ filter coefficient at time n + 1 / \ filter coefficient at time n / + \parameterj X \ delayed by k time steps /\ computed at time n ) 

The algorithm just described is the popular least-mean-square (LMS) algorithm, 
formulated for the purpose of linear prediction. The reason for popularity of this adaptive 
filtering algorithm is the simplicity of its implementation. In particular, the computational 
complexity of the algorithm, measured in terms of the number of additions and 
multiplications, is linear in the prediction order p. Moreover, the algorithm is not only 
computationally efficient but it is also effective in performance. 

The LMS algorithm is a stochastic adaptive filtering algorithm, stochastic in the sense 
that, starting from the initial condition defined by {w k 0 }^_ j> it seeks to find the 
minimum point of the error surface by following a zig-zag path. However, it never finds 
this minimum point exactly. Rather, it continues to execute a random motion around the 
minimum point of the error surface (Haykin, 2013). 

Differential Pulse-Code Modulation 


DPCM, the scheme to be considered for channel-bandwidth conservation, exploits the 
idea of linear prediction theory with a practical difference: 
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Block diagram of a differential quantizer. 


The resulting process is referred to as differential quantization. The motivation behind the 
use of differential quantization follows from two practical considerations: 

Waveform encoding in the transmitter requires the use of quantization. 

Waveform decoding in the receiver, therefore, has to process a quantized signal. 

In order to cater to both requirements in such a way that the same structure is used for 
predictors in both the transmitter and the receiver, the transmitter has to perform prediction- 
error filtering on the quantized version of the message signal rather than the signal itself, as 
shown in Figure 6. 19a. Then, assuming a noise-free channel, the predictors in the transmitter 
and receiver operate on exactly the same sequence of quantized message samples. 

To demonstrate this highly desirable and distinctive characteristic of differential PCM, 
we see from Figure 6.19a that 


Sampled 
version of 
message 
signal, m n 


Comparator 



DPCM 
encoded 
signal, m q n 


(a) 


Noisy version 
of DPCM 
encoded 
signal 



DPCM 

decoded 

output 
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DPCM system: (a) transmitter; (b) receiver. 
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where q n is the quantization noise produced by the quantizer operating on the prediction 
error e n . Moreover, from Figure 6.19a, we readily see that 

m = m + e 

q,n n q,n 

where m n is the predicted value of the original message sample m n \ thus, (6.88) is in 
perfect agreement with Figure 6.18. Hence, the use of (6.87) in (6.88) yields 

/« „ = + e, + 

q, n n n "n 

We may now invoke (6.88) of linear prediction theory to rewrite (6.89) in the equivalent 
form: 


which describes a quantized version of the original message sample m n . 

With the differential quantization scheme of Figure 6.19a at hand, we may now expand 
on the structures of the transmitter and receiver of DPCM. 


Operation of the DPCM transmitter proceeds as follows: 

Given the predicted message sample m n , the comparator at the transmitter input 
computes the prediction error e n , which is quantized to produce the quantized 
version of e n in accordance with (6.87). 

With m n and e q n at hand, the adder in the transmitter produces the quantized 
version of the original message sample m n , namely m q n , in accordance with (6.88). 

The required one-step prediction m is produced by applying the sequence of 

P n 

quantized samples {m q ^ k _ ^ to a linear FIR predictor of order p. 

This multistage operation is clearly cyclic, encompassing three steps that are repeated at 
each time step n. Moreover, at each time step, the encoder operates on the quantized 
prediction error e q „ to produce the DPCM-encoded version of the original message 
sample m n . The DPCM code so produced is a lossy-compressed version of the PCM code; 
it is “lossy” because of the prediction error. 


The structure of the receiver is much simpler than that of the transmitter, as depicted in 
Figure 6.19b. Specifically, first, the decoder reconstructs the quantized version of the 
prediction error, namely e q n . An estimate of the original message sample m n is then 
computed by applying the decoder output to the same predictor used in the transmitter of 
Figure 6.19a. In the absence of channel noise, the encoded signal at the receiver input is 
identical to the encoded signal at the transmitter output. Under this ideal condition, we 
find that the corresponding receiver output is equal to m q n , which differs from the original 
signal sample m n only by the quantization error q n incurred as a result of quantizing the 
prediction error e n . 

From the foregoing analysis, we thus observe that, in a noise-free environment, the 
linear predictors in the transmitter and receiver of DPCM operate on the same sequence of 
samples, m q n . It is with this point in mind that a feedback path is appended to the 
quantizer in the transmitter of Figure 6. 19a. 
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The output SNR of the DPCM system, shown in Figure 6.19, is, by definition, 

(SNR) 0 = 

2 

where <r M is the variance ol the original signal sample m n , assumed to be of zero mean, 
and cTq is the variance of the quantization error q n , also of zero mean. We may rewrite 
(6.91) as the product of two factors, as shown by 


(SNR) 0 



= G p (SNR) Q 

2 

where, in the first line, <r E is the variance of the prediction error e n . The factor (SNR)q 
introduced in the second line is the signal-to-quantization noise ratio, which is itself 
defined by 


(SNR) q = -| 

The other factor G p is the processing gain produced by the differential quantization 
scheme; it is formally defined by 


2 



The quantity G p , when it is greater than unity, represents a gain in signal-to-noise ratio, 

which is due to the differential quantization scheme of Figure 6.19. Now, for a given 

2 

message signal, the variance a M is fixed, so that G p is maximized by minimizing the 
variance of the prediction error e n . Accordingly, the objective in implementing the 

DPCM should be to design the prediction filter so as to minimize the prediction-error 

2 

variance, a E . 

In the case of voice signals, it is found that the optimum signal-to-quantization noise 
advantage of the DPCM over the standard PCM is in the neighborhood of 4-1 1 dB. Based 
on experimental studies, it appears that the greatest improvement occurs in going from no 
prediction to first-order prediction, with some additional gain resulting from increasing 
the order p of the prediction filter up to 4 or 5, after which little additional gain is obtained. 
Since 6 dB of quantization noise is equivalent to 1 bit per sample by virtue of the results 
presented in Table 6.1 for sinusoidal modulation, the advantage of DPCM may also be 
expressed in terms of bit rate. For a constant signal-to-quantization noise ratio, and 
assuming a sampling rate of 8 kHz, the use of DPCM may provide a saving of about 8- 
16 kHz (i.e., 1 to 2 bits per sample) compared with the standard PCM. 
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In choosing DPCM for waveform coding, we are, in effect, economizing on transmission 
bandwidth by increasing system complexity, compared with standard PCM. In other 
words, DPCM exploits the complexity-bandwidth tradeoff. However, in practice, the need 
may arise for reduced system complexity compared with the standard PCM. To achieve 
this other objective, transmission bandwidth is traded off for reduced system complexity, 
which is precisely the motivation behind DM. Thus, whereas DPCM exploits the 
complexity— bandwidth tradeoff \ DM exploits the bandwidth— complexity tradeoff. We may, 
therefore, differentiate between the standard PCM, the DPCM, and the DM along the lines 
described in Figure 6.20. With the bandwidth-complexity tradeoff being at the heart of 
DM, the incoming message signal m(t) is oversampled, which requires the use of a 
sampling rate higher than the Nyquist rate. Accordingly, the correlation between adjacent 
samples of the message signal is purposely increased so as to permit the use of a simple 
quantizing strategy for constructing the encoded signal. 


In the DM transmitter, system complexity is reduced to the minimum possible by using the 
combination of two strategies: 

Single-bit quantizer, which is the simplest quantizing strategy; as depicted in Figure 
6.21, the quantizer acts as a hard limiter with only two decision levels, namely, ±A. 
Single unit-delay element, which is the most primitive form of a predictor; in other 
words, the only component retained in the FIR predictor of Figure 6. 17 is the front-end 
block labeled , which acts as an accumulator. 

Thus, replacing the multilevel quantizer and the FIR predictor in the DPCM transmitter of 
Figure 6.19a in the manner described under points 1 and 2, respectively, we obtain the 
block diagram of Figure 6.21a for the DM transmitter. 

From this figure, we may express the equations underlying the operation of the DM 
transmitter by the following set of equations (6.95)-(6.97): 


System Transmission 

complexity bandwidth 


DPCM 

Standard 

PCM 

DM 


Increasing 


Increasing 


Illustrating the tradeoffs 
between standard PCM, DPCM, and DM. 
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DM system: (a) transmitter; (b) receiver. 


e qn = Asgn[e„] 

+A if e n > 0 
-A if e n < 0 

m q,n = m q,n-l +e q,n 

According to (6.95) and (6.96), two possibilities may naturally occur: 

The error signal e n (i.e., the difference between the message sample m n and its 
approximation m n ) is positive, in which case the approximation m n = m q n _ | is 
increased by the amount A; in this first case, the encoder sends out symbol 1. 

The error signal e n is negative, in which case the approximation m n = m q n j is 
reduced by the amount A; in this second case, the encoder sends out symbol 0. 

From this description it is apparent that the delta modulator produces a staircase 
approximation to the message signal, as illustrated in Figure 6.22a. Moreover, the rate of 
data transmission in DM is equal to the sampling rate/ s = l/7’ s , as illustrated in the binary 
sequence of Figure 6.22b. 
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(a) 


Binary 

sequence 

at modulator u u 
output 


0 


1 0 1 0 0 0 0 0 0 
(b) 


Illustration of DM. 


Following a procedure similar to the way in which we constructed the DM transmitter of 
Figure 6.21a, we may construct the DM receiver of Figure 6.21b as a special case of the 
DPCM receiver of Figure 6.19b. Working through the operation of the DM receiver, we 
find that reconstruction of the staircase approximation to the original message signal is 
achieved by passing the sequence of positive and negative pulses (representing symbols 1 
and 0, respectively) through the block labeled “accumulator.” 

Under the assumption that the channel is distortionless, the accumulated output is the 
desired m q n given that the decoded channel output is e q n . The out-of-band quantization 
noise in the high-frequency staircase waveform in the accumulator output is suppressed by 
passing it through a low-pass filter with a cutoff frequency equal to the message 
bandwidth. 


DM is subject to two types of quantization error: slope overload distortion and granular 
noise. We will discuss the case of slope overload distortion first. 

Starting with (6.97), we observe that this equation is the digital equivalent of 
integration , in the sense that it represents the accumulation of positive and negative 
increments of magnitude A. Moreover, denoting the quantization error applied to the 
message sample m n by q n , we may express the quantized message sample as 

m q,n = m n + c ln 

With this expression for m q n at hand, we find from (6.98) that the quantizer input is 

e n = i) 

Thus, except for the delayed quantization error q n _ ] , the quantizer input is a first 
backward difference of the original message sample. This difference may be viewed as a 
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digital approximation to the quantizer input or, equivalently, as the inverse of the digital 
integration process carried out in the DM transmitter. If, then, we consider the maximum 
slope of the original message signal m(t), it is clear that in order for the sequence of 
samples {m q „} to increase as fast as the sequence of message samples { m n } in a region of 
maximum slope of m(t), we require that the condition 

A . 

— > max 

be satisfied. Otherwise, we find that the step-size A is too small for the staircase 
approximation m q (t) to follow a steep segment of the message signal m(t), with the result 
that m q (t) falls behind m(t), as illustrated in Figure 6.23. This condition is called slope 
overload , and the resulting quantization error is called slope-overload distortion (noise). 
Note that since the maximum slope of the staircase approximation m q (t) is fixed by the 
step size A, increases and decreases in m q (t) tend to occur along straight lines. For this 
reason, a delta modulator using a fixed step size is often referred to as a linear delta 
modulator. 

In contrast to slope-overload distortion, granular noise occurs when the step size A is 
too large relative to the local slope characteristics of the message signal m(t), thereby 
causing the staircase approximation m q (t) to hunt around a relatively flat segment of m(t); 
this phenomenon is also illustrated in the tail end of Figure 6.23. Granular noise is 
analogous to quantization noise in a PCM system. 


d m(t) 
dr 


From the discussion just presented, it is appropriate that we need to have a large step size 
to accommodate a wide dynamic range, whereas a small step size is required for the 
accurate representation of relatively low-level signals. It is clear, therefore, that the choice 
of the optimum step size that minimizes the mean-square value of the quantization error in 
a linear delta modulator will be the result of a compromise between slope-overload 
distortion and granular noise. To satisfy such a requirement, we need to make the delta 
modulator “adaptive,” in the sense that the step size is made to vary in accordance with the 
input signal. The step size is thereby made variable, such that it is enlarged during 
intervals when the slope-overload distortion is dominant and reduced in value when the 
granular (quantization) noise is dominant. 


Granular noise 



Illustration of the two different forms of quantization error in DM. 
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Line Codes 


In this chapter, we have described three basic waveform-coding schemes: PCM, DPCM, 
and DM. Naturally, they differ from each other in several ways: transmission-bandwidth 
requirement, transmitter-receiver structural composition and complexity, and quantization 
noise. Nevertheless, all three of them have a common need: line codes for electrical 
representation of the encoded binary streams produced by their individual transmitters, so 
as to facilitate transmission of the binary streams across the communication channel. 

Figure 6.24 displays the waveforms of five important line codes for the example data 
stream 01101001. Figure 6.25 displays their individual power spectra (for positive 
frequencies) for randomly generated binary data, assuming that first, symbols 0 and 1 are 
equiprobable, second, the average power is normalized to unity, and third, the frequency/ 
is normalized with respect to the bit rate 1/ 7/,. In what follows, we describe the five line 
codes involved in generating the coded waveforms of Figure 6.24. 


Binary data 01101001 



(b) 


A 


0 



(c) 


A - 


0 


-A - 


(d) 



Line codes for the electrical representations of binary data: (a) unipolar 
nonreturn-to-zero (NRZ) signaling; (b) polar NRZ signaling; (c) unipolar return-to-zero 
(RZ) signaling; (d) bipolar RZ signaling; (e) split-phase or Manchester code. 


Normalized power spectral density Normalized power spectral density 
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Power spectra of line codes: (a) unipolar NRZ signal; (b) polar NRZ signal; (c) unipolar 
RZ signal; (d) bipolar RZ signal; (e) Manchester-encoded signal. The frequency is normalized with 
respect to the bit rate 1/T b , and the average power is normalized to unity. 
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In this line code, symbol 1 is represented by transmitting a pulse of amplitude A for the 
duration of the symbol, and symbol 0 is represented by switching off the pulse, as in 
Figure 6.24a. The unipolar NRZ line code is also referred to as on-off signaling. 
Disadvantages of on-off signaling are the waste of power due to the transmitted DC level 
and the fact that the power spectrum of the transmitted signal does not approach zero at 
zero frequency. 


In this second line code, symbols 1 and 0 are represented by transmitting pulses of 
amplitudes +A and -A, respectively, as illustrated in Figure 6.24b. The polar NRZ line 
code is relatively easy to generate, but its disadvantage is that the power spectrum of the 
signal is large near zero frequency. 


In this third line code, symbol 1 is represented by a rectangular pulse of amplitude A and 
half-symbol width and symbol 0 is represented by transmitting no pulse, as illustrated in 
Figure 6.24c. An attractive feature of the unipolar RZ line code is the presence of delta 
functions at / = 0, ±1/T h in the power spectrum of the transmitted signal; the delta 
functions can be used for bit-timing recovery at the receiver. However, its disadvantage is 
that it requires 3 dB more power than polar RZ signaling for the same probability of 
symbol error. 


This line code uses three amplitude levels, as indicated in Figure 6.24(d). Specifically, 
positive and negative pulses of equal amplitude (i.e., +A and -A) are used alternately for 
symbol 1, with each pulse having a half-symbol width; no pulse is always used for symbol 
0. A useful property of the bipolar RZ signaling is that the power spectrum of the 
transmitted signal has no DC component and relatively insignificant low-frequency 
components for the case when symbols 1 and 0 occur with equal probability. The bipolar 
RZ line code is also called alternate mark inversion (AMI) signaling. 


In this final method of signaling, illustrated in Figure 6.24e, symbol 1 is represented by a 
positive pulse of amplitude A followed by a negative pulse of amplitude -A, with both 
pulses being half-symbol wide. For symbol 0, the polarities of these two pulses are 
reversed. A unique property of the Manchester code is that it suppresses the DC 
component and has relatively insignificant low-frequency components, regardless of the 
signal statistics. This property is essential in some applications. 
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Summary and Discussion 


In this chapter we introduced two fundamental and complementary processes: 

• Sampling, which operates in the time domain; the sampling process is the link 
between an analog waveform and its discrete-time representation. 

• Quantization , which operates in the amplitude domain; the quantization process is 
the link between an analog waveform and its discrete-amplitude representation. 

The sampling process builds on the sampling theorem, which states that a strictly band- 
limited signal with no frequency components higher than W Hz is represented uniquely by 
a sequence of samples taken at a uniform rate equal to or greater than the Nyquist rate of 
2 W samples per second. The quantization process exploits the fact that any human sense, 
as ultimate receiver, can only detect finite intensity differences. 

The sampling process is basic to the operation of all pulse modulation systems, which 
may be classified into analog pulse modulation and digital pulse modulation. The 
distinguishing feature between them is that analog pulse modulation systems maintain a 
continuous amplitude representation of the message signal, whereas digital pulse 
modulation systems also employ quantization to provide a representation of the message 
signal that is discrete in both time and amplitude. 

Analog pulse modulation results from varying some parameter of the transmitted 
pulses, such as amplitude, duration, or position, in which case we speak of PAM, pulse- 
duration modulation, or pulse-position modulation, respectively. In this chapter we 
focused on PAM, as it is used in all forms of digital pulse modulation. 

Digital pulse modulation systems transmit analog message signals as a sequence of 
coded pulses, which is made possible through the combined use of sampling and 
quantization. PCM is an important form of digital pulse modulation that is endowed with 
some unique system advantages, which, in turn, have made it the standard method of 
modulation for the transmission of such analog signals as voice and video signals. The 
advantages of PCM include robustness to noise and interference, efficient regeneration of 
the coded pulses along the transmission path, and a uniform format for different kinds of 
baseband signals. 

Indeed, it is because of this list of advantages unique to PCM that it has become the 
method of choice for the construction of public switched telephone networks (PSTNs). In 
this context, the reader should carefully note that the telephone channel viewed from the 
PSTN by an Internet service provider, for example, is nonlinear due to the use of 
companding and, most importantly, it is entirely digital. This observation has a significant 
impact on the design of high-speed modems for communications between a computer user 
and server, which will be discussed in Chapter 8. 

DM and DPCM are two other useful forms of digital pulse modulation. The principal 
advantage of DM is the simplicity of its circuitry, which is achieved at the expense of 
increased transmission bandwidth. In contrast, DPCM employs increased circuit 
complexity to reduce channel bandwidth. The improvement is achieved by using the idea 
of prediction to reduce redundant symbols from an incoming data stream. A further 
improvement in the operation of DPCM can be made through the use of adaptivity to 
account for statistical variations in the input data. By so doing, bandwidth requirement 
may be reduced significantly without serious degradation in system performance. 
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Problems 

Sampling Process 

In natural sampling, an analog signal g(t) is multiplied by a periodic train of rectangular pulses c(t), 
each of unit area. Given that the pulse repetition frequency of this periodic train is f s and the duration 
of each rectangular pulse is T (with f s T <Kl), do the following: 

Find the spectrum of the signal s(t) that results from the use of natural sampling; you may assume 
that time t = 0 corresponds to the midpoint of a rectangular pulse in c(f). 

Show that the original signal g(f) may be recovered exactly from its naturally sampled version, 
provided that the conditions embodied in the sampling theorem are satisfied. 

Specify the Nyquist rate and the Nyquist interval for each of the following signals: 
git) = sinc(2000- 
git) = sinc 2 (200?). 
git) = sinc(2000 + sinc 2 (200f). 

Discussion of the sampling theorem presented in Section 6.2 was confined to the time domain. 
Describe how the sampling theorem can be applied in the frequency domain. 

Pulse-Amplitude Modulation 

Figure P6.4 shows the idealized spectrum of a message signal m(t). The signal is sampled at a rate 
equal to 1 kHz using flat-top pulses, with each pulse being of unit amplitude and duration 0.1ms. 
Determine and sketch the spectrum of the resulting PAM signal. 



In this problem, we evaluate the equalization needed for the aperture effect in a PAM system. The 
operating frequency /=/ s /2, which corresponds to the highest frequency component of the message 
signal for a sampling rate equal to the Nyquist rate. Plot l/sinc(0.5T/r s ) versus T/T s , and hence find 
the equalization needed when T/T^ = 0.1. 

Consider a PAM wave transmitted through a channel with white Gaussian noise and minimum 
bandwidth Bj = 1/2 T s , where T s is the sampling period. The noise is of zero mean and power 
spectral density N 0 1 2. The PAM signal uses a standard pulse git) with its Fourier transform defined 
by 


Gif) 


1 

2 5 t ’ 

0 , 


]/]<S T 

\f\>B T 


By considering a full-load sinusoidal modulating wave, show that PAM and baseband-signal 
transmission have equal SNRs for the same average transmitted power. 

Twenty-four voice signals are sampled uniformly and then time-division multiplexed (TDM). The 
sampling operation uses flat-top samples with 1 ps duration. The multiplexing operation includes 


314 


Conversion of Analog Waveforms into Coded Pulses 


provision for synchronization by adding an extra pulse of sufficient amplitude and also 1 ps duration. 
The highest frequency component of each voice signal is 3.4 kHz. 

Assuming a sampling rate of 8 kHz, calculate the spacing between successive pulses of the 
multiplexed signal. 

Repeat your calculation assuming the use of Nyquist rate sampling. 

Twelve different message signals, each with a bandwidth of 10 kHz, are to be multiplexed and 
transmitted. Determine the minimum bandwidth required if the multiplexing/modulation method 
used is time-division multiplexing (TDM), which was discussed in Chapter 1. 

Pulse-Code Modulation 

A speech signal has a total duration of 10 s. It is sampled at the rate of 8 kHz and then encoded. The 
signal-to-(quantization) noise ratio is required to be 40 dB. Calculate the minimum storage capacity 
needed to accommodate this digitized speech signal. 

Consider a uniform quantizer characterized by the input-output relation illustrated in Figure 6.9a. 
Assume that a Gaussian-distributed random variable with zero mean and unit variance is applied to 
this quantizer input. 

What is the probability that the amplitude of the input lies outside the range -4 to +4? 

Using the result of part a, show that the output SNR of the quantizer is given by 

(SNR) 0 = 6R - 7.2 dB 

where R is the number of bits per sample. Specifically, you may assume that the quantizer input 
extends from -4 to +4. Compare the result of part b with that obtained in Example 2. 

A PCM system uses a uniform quantizer followed by a 7-bit binary encoder. The bit rate of the 
system is equal to 50 x 10 6 bits/s. 

What is the maximum message bandwidth for which the system operates satisfactorily? 
Determine the output signal-to-(quantization) noise when a full-load sinusoidal modulating wave 
of frequency 1 MHz is applied to the input. 

Show that with a nonuniform quantizer the mean-square value of the quantization error is 

2 

approximately equal to (l/12)E ; A ; p ; . , where A, is the zth step size and p t is the probability that the 
input signal amplitude lies within the zth interval. Assume that the step size A; is small compared 
with the excursion of the input signal. 

A sinusoidal signal with an amplitude of 3.25 V is applied to a uniform quantizer of the midtread 
type whose output takes on the values 0, ±1, ±2, ±3 V. Sketch the waveform of the resulting 
quantizer output for one complete cycle of the input. 

Repeat this evaluation for the case when the quantizer is of the midrise type whose output takes 
on the values 0.5, +1.5, ±2.5, ±3.5 V. 

The signal 

m(t) (volts) = 6sin(27tt) 

is transmitted using a 40-bit binary PCM system. The quantizer is of the midrise type, with a step 
size of 1 V. Sketch the resulting PCM wave for one complete cycle of the input. Assume a sampling 
rate of four samples per second, with samples taken at f( s) = ±1/8, ±3/8, ±5/8, . . . 

Figure P6.15 shows a PCM signal in which the amplitude levels of +1V and -IV are used to 
represent binary symbols 1 and 0, respectively. The codeword used consists of three bits. Find the 
sampled version of an analog signal from which this PCM signal is derived. 
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Consider a chain of (n — 1) regenerative repeaters, with a total of n sequential decisions made on a 
binary PCM wave, including the final decision made at the receiver. Assume that any binary symbol 
transmitted through the system has an independent probability pj of being inverted by any repeater. 
Let p n represent the probability that a binary symbol is in error after transmission through the 
complete system. 

Show that 

Pn = |[1 -(1 - 2 Pl f] 

If p i is very small and n is not too large, what is the corresponding value of p n ? 

Discuss the basic issues involved in the design of a regenerative repeater for PCM. 

Linear Prediction 

A one-step linear predictor operates on the sampled version of a sinusoidal signal. The sampling rate 
is equal to 10/ 0 , where f 0 is the frequency of the sinusoid. The predictor has a single coefficient 
denoted by wq. 

Determine the optimum value of wq required to minimize the prediction-error variance. 
Determine the minimum value of the prediction error variance. 

A stationary process X(l) has the following values for its autocorrelation function: 

R x ( 0) = i 
R x ( 0 ) = 0.8 

R x (0) = 0.6 
R x ( 0) = 0.4 

Calculate the coefficients of an optimum linear predictor involving the use of three unit-time 
delays. 

Calculate the variance of the resulting prediction error. 

Repeat the calculations of Problem 6.19, but this time use a linear predictor with two unit-time 
delays. Compare the performance of this second optimum linear predictor with that considered in 
Problem 6.19. 

Differential Pulse-Code Modulation 

A DPCM system uses a linear predictor with a single tap. The normalized autocorrelation function 
of the input signal for a lag of one sampling interval is 0.75. The predictor is designed to minimize 
the prediction-error variance. Determine the processing gain attained by the use of this predictor. 

Calculate the improvement in processing gain of a DPCM system using the optimized three-tap 
linear predictor. For this calculation, use the autocorrelation function values of the input signal 
specified in Problem 6.19. 

In this problem, we compare the performance of a DPCM system with that of an ordinary PCM 
system using companding. 
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For a sufficiently large number of representation levels, the signal-to-(quantization) noise ratio of 
PCM systems, in general, is defined by 

10 log 10 (SNR) o (dB) = a+6n 

where 2" is the number of representation levels. For a companded PCM system using the p- law, the 
constant a is itself defined by 

a(dB) a 4.77 - 20 log 10 log(l+/r) 

For a DPCM system, on the other hand, the constant a lies in the range -3 < a < 15 dBs. The 
formulas quoted herein apply to telephone-quality speech signals. 

Compare the performance of the DPCM system against that of the //-companded PCM system with 
p= 255 for each of the following scenarios: 

The improvement in (SNR) 0 realized by DPCM over companded PCM for the same number of 
bits per sample. 

The reduction in the number of bits per sample required by DPCM, compared with the 
companded PCM for the same (SNR) 0 . 

In the DPCM system depicted in Figure P6.24, show that in the absence of channel noise, the 
transmitting and receiving prediction filters operate on slightly different input signals. 



Transmitter Receiver 


Figure P6.25 depicts the block diagram of adaptive quantization for DPCM. The quantization is of a 
backward estimation kind because samples of the quantization output and prediction errors are used 
to continuously derive backward estimates of the variance of the message signal. This estimate 
computed at time n is denoted by o m , n . Given this estimate, the step size is varied so as to match the 
actual variance of the message sample m n , as shown by 

A „ = ! i( 7 m,n 

.2 

where <J m , n is the estimate of the standard deviation and (p is a constant. An attractive feature of the 
adaptive scheme in Figure P6.25 is that samples of the quantization output and the prediction error 
are used to compute the predictor's coefficients. 

Modify the block diagram of the DPCM transmitter in Figure 6.19a so as to accommodate adaptive 
prediction with backward estimation. 


Input 



Output 


Transmitter 


Receiver 
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Delta Modulation 

Consider a test signal m(t) defined by a hyperbolic tangent function: 

m(t) = Atanh(/fr) 

where A and /? are constants. Determine the minimum step size A for DM of this signal, which is 
required to avoid slope-overload distortion. 

Consider a sine wave of frequency f m and amplitude A m , which is applied to a delta modulator of 
step size A. Show that slope-overload distortion will occur if 

A 

“ 27 i/ m r g 

where T s is the sampling period. What is the maximum power that may be transmitted without 
slope-overload distortion? 

A linear delta modulator is designed to operate on speech signals limited to 3.4 kHz. The 
specifications of the modulator are as follows: 

Sampling rate = 10/Nyq U i st > where / N yq uist the Nyquist rate of the speech signal. 

Step size A = 100 mV. 

The modulator is tested with a 1 kHz sinusoidal signal. Determine the maximum amplitude of this 
test signal required to avoid slope-overload distortion. 

In this problem, we derive an empirical formula for the average signal-to-( quantization) noise ratio of 
a DM system with a sinusoidal signal of amplitude A and frequency f m as the test signal. Assume that 
the power spectral density of the granular noise generated by the system is governed by the formula 



where f s is the sampling rate and A is the step size. (Note that this formula is basically the same as that 
for the power spectral density of quantization noise in a PCM system with A/2 for PCM being replaced 
by A for DM.) The DM system is designed to handle analog message signals limited to bandwidth W. 
Show that the average quantization noise power produced by the system is 

2 2 2 

4 71 AfW 
N = ^2- 

3/s 

where it is assumed that the step size A has been chosen in accordance with the formula used in 
Problem 6.28 so as to avoid slope-overload distortion. 

Hence, determine the signal-to-(quantization) noise ratio of the DM system for a sinusoidal input. 

Consider a DM system designed to accommodate analog message signals limited to bandwidth 
W = 5 kHz. A sinusoidal test signal of amplitude A = 1 V and frequency f m = 1 kHz is applied to the 
system. The sampling rate of the system is 50 kHz. 

Calculate the step size A required to minimize slope overload distortion. 

Calculate the signal-to-(quantization) noise ratio of the system for the specified sinusoidal test 
signal. 

For these calculations, use the formula derived in Problem 6.29. 

Consider a low-pass signal with a bandwidth of 3 kHz. A linear DM system with step size A = 0.1V 
is used to process this signal at a sampling rate 10 times the Nyquist rate. 

Evaluate the maximum amplitude of a test sinusoidal signal of frequency 1kHz, which can be 
processed by the system without slope-overload distortion. 
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For the specifications given in part a, evaluate the output SNR under (i) prefiltered and (ii) 
postfiltered conditions. 

In the conventional form of DM, the quantizer input may be viewed as an approximate to the 
derivative of the incoming message signal m(t). This behavior leads to a drawback of DM: 
transmission disturbances (e.g., noise) result in an accumulation error in the demodulated signal. 
This drawback can be overcome by integrating the message signal m(t) prior to DM, resulting in 
three beneficial effects: 

Low frequency content of m(t ) is pre-emphasized. 

Correlation between adjacent samples of m(t) is increased, tending to improve overall system 
performance by reducing the variance of the error signal at the quantizer input. 

Design of the receiver is simplified. 

Such a DM scheme is called delta-sigma modulation. 

Construct a block diagram of the delta-sigma modulation system in such a way that it provides an 
interpretation of the system as a “smoothed” version of 1-bit PCM in the following composite sense: 
smoothness implies that the comparator output is integrated prior to quantization, and 
1-bit modulation merely restates that the quantizer consists of a hard limiter with only two 
representation levels. 

Explain how the receiver of the delta-sigma modulation system is simplified, compared with 
conventional DM. 


In this problem, we derive the formulas used to compute the power spectra of Figure 6.25 for the five 
line codes described in Section 6.10. In the case of each line code, the bit duration is T b and the pulse 
amplitude A is conditioned to normalize the average power of the line code to unity as indicated in Fig- 
ure 6.25. Assume that the data stream is randomly generated and symbols 0 and 1 are equally likely. 
Derive the power spectral densities of these line codes as summarized here: 

Unipolar NRZ signals: 


Line Codes 



Polar NRZ signals: 


S(f) = A 2 T b sincVb) 


Unipolar RZ signals: 



Bipolar RZ signals: 



Manchester-encoded signals: 



Hence, confirm the spectral plots displayed in Figure 6.25. 
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A randomly generated data stream consists of equiprobable binary symbols 0 and 1. It is encoded 
into a polar NRZ waveform with each binary symbol being defined as follows: 


it t 


T b T b 
cos I — I , <t< — 

s(t ) = \ \T b J 2 2 


0 , 


otherwise 


Sketch the waveform so generated, assuming that the data stream is 00101 1 10. 

Derive an expression for the power spectral density of this signal and sketch it. 

Compare the power spectral density of this random waveform with that defined in part b of 
Problem 6.33. 

Given the data stream 1110010100, sketch the transmitted sequence of pulses for each of the 

following line codes: 
unipolar NRZ 
polar NRZ 
unipolar RZ 
bipolar RZ 
Manchester code. 


Computer Experiments 

4 

A sinusoidal signal of frequency f 0 = 10 /2;rHz is sampled at the rate of 8 kHz and then applied to 
a sample-and-hold circuit to produce a flat-topped PAM signal s(t) with pulse duration T = 500 yts. 
Compute the waveform of the PAM signal s{t). 

Compute |S(/)| , denoting the magnitude spectrum of the PAM signal s(f). 

Compute the envelope of \S(f)\ . Hence confirm that the frequency at which this envelope goes 
through zero for the first time is equal to (1/7) = 20 kHz. 

In this problem, we use computer simulation to compare the performance of a companded PCM 
system using the p - law against that of the corresponding system using a uniform quantizer. The 
simulation is to be performed for a sinusoidal input signal of varying amplitude. 

With a companded PCM system in mind. Table 6.4 describes the 15-segment pseudo-linear 
characteristic that consists of 1 5 linear segments configured to approximate the logarithmic p -law 


The 15-segment companding characteristic (p = 255) 



0 

2 

±31 

la, lb 

4 

±95 

2a, 2b 

8 

±223 

3a, 3b 

16 

±479 

4a, 4b 

32 

±991 

5a, 5b 

64 

±2015 

6a, 6b 

128 

±4063 

la, lb 

256 

±8159 
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of (6.48), with p = 255. This approximation is constructed in such a way that the segment endpoints 
in Table 6.4 lie on the compression curve computed from (6.48). 

Using the p -law described in Table 6.4, plot the output signal-to-noise ratio as a function of the 
input signal-to-noise ratio, both ratios being expressed in decibels. 

Compare the results of your computation in part (a) with a uniform quantizer having 256 
representation levels. 

In this experiment we study the linear adaptive prediction of a signal x„ governed by the following 
recursion: 

x n = O.Sx n _ l -0.lx n _ 2 + 0.lv n 


where v n is drawn from a discrete-time white noise process of zero mean and unit variance. (A 
process generated in this manner is referred to as an autoregressive process of order two.) 
Specifically, the adaptive prediction is performed using the normalized LMS algorithm defined by 


x 


n 


P 

X w k,n x n-k 


k= 1 




v k,n+ 1 


k,n + /*/ 


x n-k 


v n - k c n 


1,2 


where p is the prediction order and ju is the normalized step-size parameter. The important point to 
note here is that p is dimensionless and stability of the algorithm is assured by choosing it in 
accordance with the formula 


The algorithm is initiated by setting 


0 < p< 2 


w k 0 = 0 for all k 


The learning curve of the algorithm is defined as a plot of the mean-square error versus the number 
of iterations n for specified parameter values, which is obtained by averaging the plot of e n versus n 
over a large number of different realizations of the algorithm. 

Plot the learning curves for the adaptive prediction of x n for a fixed prediction order p = 5 and 
three different values of step-size parameter: p - 0.0075, 0.05, and 0.5. 

What observations can you make from the learning curves of part a? 


In this problem, we study adaptive delta modulation, the underlying principle of which is two-fold: 
If successive errors are of opposite polarity, then the delta modulator is operating in the granular 
mode, in which case the step size A is reduced. 

If, on the other hand, the successive errors are of the same polarity, then the delta modulator is 
operating in the slope-overload mode, in which case the step size A is increased. 

Parts a and b of Figure P6.39 depict the block diagrams of the transmitter and receiver of the 
adaptive delta modulator, respectively, in which the step size, A, is increased or decreased by a factor 
of 50% at each iteration of the adaptive process, as shown by: 


m 

q. n 
A min 


( m q.n + °- 5m q.n-0 if A „ - 1 ^ A n 


if A „-l <A min 
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where A„ is the step size at iteration (time step) n of the adaptation algorithm, and m„ n is the 1-bit 
quantizer output that equals +1. 

Specifications: The input signal applied to the transmitter is sinusoidal as shown by 

m t = A sin (27t f t) 

where A = 10 and/ m =/ s /100 where / s is the sampling frequency; the step size A n = 1 for all n ; 

Amin =1/8- 

Using the above-described adaptation algorithm, use a computer to plot the resulting waveform 
for one complete cycle of the sinusoidal modulating signal, and also display the coded modulator 
output in the transmitter. 

For the same specifications, repeat the computation using linear modulation. 

Comment on the results obtained in parts a and b of the problem. 


Sampled 
message signal 



m 


n 


(a) 


Sampled 
channel output 



Reconstructed 

message 

signal 


(b) 
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Notes 


1. For an exhaustive study of quantization noise in signal processing and communications, see 
Widrow and Kollar (2008). 

2. The two necessary conditions of (3.42) and (3.47) for optimality of a scalar quantizer were 
reported independently by Lloyd (1957) and Max (1960), hence the name “Lloyd-Max quantizer.” 
The derivation of these two optimality conditions presented in this chapter follows the book by 
Gersho and Gray (1992). 

3. The /t-law is used in the USA, Canada, and Japan. On the other hand, in Europe, the A-law is 
used for signal compression. 

4. In actual PCM systems, the companding circuitry does not produce an exact replica of the 
nonlinear compression curves shown in Figure 6.14. Rather, it provides a piecewise linear 
approximation to the desired curve. By using a large enough number of linear segments, the 
approximation can approach the true compression curve very closely; for detailed discussion of this 
issue, see Bellamy (1991). 

5. For a discussion of noise in analog modulation systems with particular reference to FM, see 
Chapter 4 of Communication Systems (Haykin, 2001). 

6. To simplify notational matters, R M is used to denote the autocorrelation matrix in (6.70) rather 
than R MM as in Chapter 4 on Stochastic Processes. To see the rationale for this simplification, the 
reader is referred to (6.79) for simplicity. For the same reason, henceforth the practice adopted in this 
chapter will be continued for the rest of the book, dealing with autocorrelation matrices and power 
spectral density. 

7. An optimum predictor that follows (6.77) is said to be a special case of the Wiener filter. 

8. For a detailed discussion of adaptive DPCM involving the use of adaptive quantization with 
forward estimation as well as backward estimation, the reader is referred to the classic book (Jay ant 
and Noll, 1984). 
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Introduction 


Chapter 6 on the conversion of analog waveforms into coded pulses represents the 
transition from analog communications to digital communications. This transition has 
been empowered by several factors: 

Ever-increasing advancement of digital silicon chips, digital signal processing, and 
computers, which, in turn, has prompted further enhancement in digital silicon 
chips, thereby repeating the cycle of improvement. 

Improved reliability, which is afforded by digital communications to a much greater 
extent than is possible with analog communications. 

Broadened range of multiplexing of users, which is enabled by the use of digital 
modulation techniques. 

Communication networks, for which, in one form or another, the use of digital 
communications is the preferred choice. 

In light of these compelling factors, we may justifiably say that we live in a “digital 
communications world.” For an illustrative example, consider the remote connection of 
two digital computers, with one computer acting as the information source by calculating 
digital outputs based on observations and inputs fed into it; the other computer acts as the 
recipient of the information. The source output consists of a sequence of Is and Os, with 
each binary symbol being emitted every T b seconds. The transmitting part of the digital 
communication system takes the Is and Os emitted by the source computer and encodes 
them into distinct signals denoted by .^(t) and s 2 (t), respectively, which are suitable for 
transmission over the analog channel. Both sft) and s 2 (t) are real-valued energy signals, 
as shown by 


With the analog channel represented by an AWGN model, depicted in Figure 7.1, the 
received signal is defined by 


where w(t) is the channel noise. The receiver has the task of observing the received signal 
x(t ) for a duration of 7 h seconds and then making an estimate of the transmitted signal 
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Transmitted Received 

signal signal 



+ 


White Gaussian noise 
w(r) 


AWGN model of a channel. 


sft), or equivalently the ith symbol, i = 1,2. However, owing to the presence of channel 
noise, the receiver will inevitably make occasional errors. The requirement, therefore, is to 
design the receiver so as to minimize the average probability of symbol error, defined as 


where 7i\ and n 2 are the prior probabilities of transmitting symbols 1 and 0, respectively, 
and m is the estimate of the symbol 1 or 0 sent by the source, which is computed by the 
receiver. The P(m = 0 1 1 sent) and P(m = 1 1 0 sent) are conditional probabilities. 

In minimizing the average probability of symbol error between the receiver output and 
the symbol emitted by the source, the motivation is to make the digital communication 
system as reliable as possible. To achieve this important design objective in a generic 
setting that involves an M-ary alphabet whose symbols are denoted by m\, m 2 , . . ., m M , we 
have to understand two basic issues: 

How to optimize the design of the receiver so as to minimize the average probability 
of symbol error. 

How to choose the set of signals s^(t), ,v 2 (f), ..., s M (t) for representing the symbols 
w 1? m 2 , ..., m M , respectively, since this choice affects the average probability of 
symbol error. 

The key question is how to develop this understanding in a principled as well as insightful 
manner. The answer to this fundamental question is found in the geometric representation 
of signals. 

Geometric Representation of Signals 


The essence of geometric representation of signals is to represent any set of M energy 
signals {sft)} as linear combinations of N orthonormal basis functions, where N < M. 
That is to say, given a set of real-valued energy signals, sft), s 2 (t), ..., s^(t), each of 
duration T seconds, we write 


P e = 7t^P(m = 0 1 1 sent) + 7T 2 P{m = 1 1 0 sent) 



where the coefficients of the expansion are defined by 
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The real-valued basis functions tj>y(t ), q ^(r), <j>^t) form an orthonormal set, by which 

we mean 



1 if i = j 
0 if i ^ j 


where 8^ is the Kronecker delta. The first condition of (7.6) states that each basis function 
is normalized to have unit energy. The second condition states that the basis functions 

<j> 2 (t), . . ., </>^(t) are orthogonal with respect to each other over the interval 0 <t<T. 

N 

For prescribed i, the set of coefficients { s- } ■ _ , may be viewed as an N -dimensional 
signal vector , denoted by s The important point to note here is that the vector s ( - bears a 
one-to-one relationship with the transmitted signal Sj(t): 


• Given the N elements of the vector s ; - operating as input, we may use the scheme 
shown in Figure 7.2a to generate the signal s t (t), which follows directly from (7.4). 
This figure consists of a bank of N multipliers with each multiplier having its own 
basis function followed by a summer. The scheme of Figure 7.2a may be viewed as 
a synthesizer. 

• Conversely, given the signals Sj(t), i = 1,2, ..., M, operating as input, we may use 
the scheme shown in Figure 7.2b to calculate the coefficients s^, s !2 , ..., s iN which 
follows directly from (7.5). This second scheme consists of a bank of N product- 
integrators or correlators with a common input, and with each one of them supplied 
with its own basis function. The scheme of Figure 7.2b may be viewed as an 
analyzer. 



(a) (b) 

(a) Synthesizer for generating the signal ,?■(?). (b) Analyzer for reconstructing the signal 

vector {s, }. 
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Illustrating the geometric representation of signals for 
the case when TV = 2 and M = 3. 


Accordingly, we may state that each signal in the set { Sj(t ) } is completely determined by 
the signal vector 


s ; = 


h'2 


'iN 


i = 1,2, M 


Furthermore, if we conceptually extend our conventional notion of two- and three- 
dimensional Euclidean spaces to an N-dimensional Euclidean space, we may visualize the 
set of signal vectors { s,| / = 1,2, ..., M } as defining a corresponding set of M points in an 
TV-dimensional Euclidean space, with TV mutually perpendicular axes labeled fa , fa, ..., 
fa j. This TV-dimensional Euclidean space is called the signal space. 

The idea of visualizing a set of energy signals geometrically, as just described, is of 
profound theoretical and practical importance. It provides the mathematical basis for the 
geometric representation of energy signals in a conceptually satisfying manner. This form 
of representation is illustrated in Figure 7.3 for the case of a two-dimensional signal space 
with three signals; that is, TV = 2 and M = 3. 

In an TV-dimensional Euclidean space, we may define lengths of vectors and angles 
between vectors. It is customary to denote the length (also called the absolute value or 
norm) of a signal vector s,- by the symbol ||s ( ||. The squared length of any signal vector s ; - is 
defined to be the inner product or dot product of s,- with itself, as shown by 
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2 


T 

= S; S; 


N 2 

= £4 i = 1,2, M 
7 = 1 

where jy is the j th element of s,- and the superscript T denotes matrix transposition. 

There is an interesting relationship between the energy content of a signal and its 
representation as a vector. By definition, the energy of a signal of duration T seconds is 

T 

E i =\s 2 i (t) dr, i=l,2, ...,M 
J o 

Therefore, substituting (7.4) into (7.9), we get 


r T 

N 

N 

J 0 


£ s M f ) 


7=1 

k = i 


Interchanging the order of summation and integration, which we can do because they are 
both linear operations, and then rearranging terms we get 


E i = X Z v4 ^(04(0 dl 

7 = 1 *= 1 ° 

Since, by definition, the Mi) form an orthonormal set in accordance with the two 
conditions of (7.6), we find that (7.10) reduces simply to 



7 = 1 



Thus, (7.8) and (7.1 1) show that the energy of an energy signal ,y,(f) is equal to the squared 
length of the corresponding signal vector s ,-(f). 

In the case of a pair of signals s { (t) and s^t) represented by the signal vectors s ( - and s^, 
respectively, we may also show that 

T .p 

f Si(t)s k (t) d t = s- s k 
J o 

Equation (7.12) states: 


T N 

Note that the inner product s - s, is invariant to the choice of basis functions { </>At ) } . _ , 

iK. y — l 

in that it only depends on the components of the signals s,-(f) and s^it) projected onto each 
of the basis functions. 
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Yet another useful relation involving the vector representations of the energy signals 
Sj(t ) and s k (t) is described by 

II II 2 V’ ( \ 1 

II s - s *ll = 

j = i 

T 

= f (Si(t)-s k (t)) 2 dt 
J o 

where ||s ; - - s 4 | is the Euclidean distance d ik between the points represented by the signal 
vectors s ; - and S/ r 

To complete the geometric representation of energy signals, we need to have a 
representation for the angle 0 ik subtended between two signal vectors s ; - and s^. By 
definition, the cosine of the angle 0% is equal to the inner product of these two vectors 
divided by the product of their individual norms, as shown by 


cos (%) = 


T 

s ih 



The two vectors s,- and s f . are thus orthogonal or perpendicular to each other if their inner 
product s ( - s k is zero, in which case 0 lk = 90°; this condition is intuitively satisfying. 


The Schwarz Inequality 

Consider any pair of energy signals sft) and ,v 2 (f). The Schwarz inequality states 

Q Si(f)^ 2 (0 d f) ^(j -SiCO dr )(j ^(0 df ) 

—00 —OO —00 


The equality holds if, and only if, sfit) = cs\(t), where c is any constant. 

To prove this important inequality, let sft) and Sjit) be expressed in terms of the pair of 
orthonormal basis functions <f>ft) and (jt-ft) as follows: 

•Sl(0 = ^ii^i(0 + *12^2(0 
S 2 (t) = s^tfift) + s 22 </> 2 (t) 


where </>i(t) and <jh(t) satisfy the orthonormality conditions over the time interval (- 00 , 00 ) : 


J </>i(t)</>j(t) dt = S tj = 

—00 


1 for j = i 
0 otherwise 


On this basis, we may represent the signals S](t) and s 2 (t) by the following respective pair 
of vectors, as illustrated in Figure 7.4: 
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Vector representations of signals iqR) and sft), providing 
the background picture for proving the Schwarz inequality. 


From Figure 7.4 we readily see that the cosine of angle subtended between the vectors Sj 
and s 2 is 


T 



r°° 

I s 1 (t)s 7 (t) d t 


f *0 4'\f 4(0 d.; 


1/2 


where we have made use of (7.14) and (7.12). Recognizing that |cos$| < 1, the Schwarz 
inequality of (7.15) immediately follows from (7.16). Moreover, from the first line of 
(7.16) we note that |cos 9 | = 1 if, and only if, s 2 = cSj; that is, s 2 (t) = csft), where c is an 
arbitrary constant. 

Proof of the Schwarz inequality, as presented here, applies to real-valued signals. It may 
be readily extended to complex-valued signals, in which case (7.15) is reformulated as 

CO / oo - n 1/2^ co 7 \l/2 

J ^l(04(0 dt ^ (J | 5 i(0|"dfj [J \s 2 (t)\ dfj 


where the asterisk denotes complex conjugation and the equality holds if, and only if, 
s 2 (,t) = csi(t), where c is a constant. 


Having demonstrated the elegance of the geometric representation of energy signals with 
an example, how do we justify it in mathematical terms? The answer to this question lies 
in the Gram-Schmidt orthogonalization procedure, for which we need a complete 
orthonormal set of basis functions. To proceed with the formulation of this procedure, 
suppose we have a set of M energy signals denoted by ijlf), s 2 (t), .... s^ft). Starting with 
sft) chosen from this set arbitrarily, the first basis function is defined by 


MO 



where if is the energy of the signal ,V| (f). 
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Then, clearly, we have 

5 i( f ) = 

= 

where the coefficient .sq ] = JlT x and </> x (t) has unit energy as required. 
Next, using the signal s 2 (t), we define the coefficient s 2x as 

r T 

s 2] = s 2 (t)<f 1 (t ) dr 

J o 


We may thus introduce a new intermediate function 

g 2 (f) = s 2 {t) - s 2l (/) x (t) 


which is orthogonal to <f> x (t) over the interval 0 < t < T by virtue of the definition of s 2X and 
the fact that the basis function <j> x (r) has unit energy. Now, we are ready to define the 
second basis function as 


= 



Substituting (7.19) into (7.20) and simplifying, we get the desired result 


= 


(0 



where E 2 is the energy of the signal s 2 (t). From (7.20) we readily see that 


in which case (7.21) yields 


r 2 

T <j> 2 {t) dr = 1 

J o 

f (t) dr = 0 


That is to say, <f> x (t) and <j> 2 (i) form an orthonormal pair as required. 
Continuing the procedure in this fashion, we may, in general, define 

i- 1 

Si(t) = - X s ^j {t) 

7=1 


where the coefficients Sjj are themselves defined by 



For (=1, the function g ( (r) reduces to j,-(r). 

Given the g,(r), we may now define the set of basis functions 
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which form an orthonormal set. The dimension N is less than or equal to the number of 
given signals, M, depending on one of two possibilities: 

• The signals s M (t) form a linearly independent set, in which case 

N = M. 

• The signals 5 j(t), Siit), . .., s M (t) are not linearly independent, in which case N < M 
and the intermediate function g,(t) is zero for i > N. 

Note that the conventional Fourier series expansion of a periodic signal, discussed in 
Chapter 2, may be viewed as a special case of the Gram-Schmidt orthogonalization 
procedure. Moreover, the representation of a band-limited signal in terms of its samples 
taken at the Nyquist rate, discussed in Chapter 6 , may be viewed as another special case. 
However, in saying what we have here, two important distinctions should be made: 

The form of the basis functions <f>\{t), <j)i(t), . .., </> N (t) has not been specified. That is 
to say, unlike the Fourier series expansion of a periodic signal or the sampled 
representation of a band-limited signal, we have not restricted the Gram-Schmidt 
orthogonalization procedure to be in terms of sinusoidal functions (as in the Fourier 
series) or sine functions of time (as in the sampling process). 

The expansion of the signal Sj(t) in terms of a finite number of terms is not an 
approximation wherein only the first N terms are significant; rather, it is an exact 
expression, where N and only N terms are significant. 


2B1Q Code 

The 2B1Q code is the North American line code for a special class of modems called 
digital subscriber lines. This code represents a quaternary PAM signal as shown in the 
Gray-encoded alphabet of Table 7.1. The four possible signals ,v ] (f), .V 2 C), £3 (t), and ,v 4 (f) 
are amplitude-scaled versions of a Nyquist pulse. Each signal represents a dibit (i.e., pair 
of bits). The issue of interest is to find the vector representation of the 2B IQ code. 

This example is simple enough for us to solve it by inspection. Let <j)\(t) denote a pulse 
normalized to have unit energy. The ^(t) so defined is the only basis function for the 
vector representation of the 2B1Q code. Accordingly, the signal-space representation of 
this code is as shown in Figure 7.5. It consists of four signal vectors Sj, S2, S3, and s 4 , 
which are located on the (zij-axis in a symmetric manner about the origin. In this example, 
we have M = 4 and N = 1 . 


Amplitude levels of the 2B1Q code 


•V| (0 

-3 

00 

S 2 (t) 

-1 

01 

53 (f) 

+ 1 

11 

J 4 W 

+3 

10 
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S 1 s 2 

s 3 s 4 

1 

0 




1 





Signal-space representation of the 2B1Q code. 


We may generalize the result depicted in Figure 7.5 for the 2B1Q code as follows: the 
signal-space diagram of an M - ary PAM signal, in general, is one-dimensional with M 
signal points uniformly positioned on the only axis of the diagram. 


Conversion of the Continuous AWGN Channel into a 
Vector Channel 


Suppose that the input to the bank of N product integrators or correlators in Figure 7.2b is 
not the transmitted signal s;(t) but rather the received signal x(t) defined in accordance 
with the AWGN channel of Figure 7.1. That is to say, 

x(t) = 5,(0 + w(r), |° ~'-T 

[ *=1,2, M 

where w{t) is a sample function of the white Gaussian noise process Wit) of zero mean and 
power spectral density Nq/2. Correspondingly, we find that the output of correlator j, say, 
is the sample value of a random variable Xj , whose sample value is defined by 



The first component, ,v,y, is the deterministic component of xj due to the transmitted signal 
Sj(t), as shown by 

r T 

s ii = M dt 

J o 

The second component, wj, is the sample value of a random variable Wj due to the channel 
noise wit), as shown by 

r T 

W: = w(t)</>j(t) dt 

J 0 

Consider next a new stochastic process X'it) whose sample function x'it) is related to 
the received signal x(t) as follows: 

N 

x\t) = x(o- y, x j < t > j^ 

7 = 1 
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Substituting (7.24) and (7.25) into (7.28), and then using the expansion of (7.4), we get 

N 

x'(t) = Sj (t) + w(t)- ^ ( '' ( ,+ "•,')#;(/) 

7 = 1 

N 

= W(t)~ ^ W^f) 

7=1 

= w'(t) 

The sample function x'(t) , therefore, depends solely on the channel noise w(t). On the 
basis of (7.28) and (7.29), we may thus express the received signal as 

N 

x ( t ) = xv/o+ao 

7=1 

= Y x /^/( f ) + w '( ? ) 

7=1 

Accordingly, we may view vv'(t) as a remainder term that must be included on the right- 
hand side of (7.30) to preserve equality. It is informative to contrast the expansion of the 
received signal x(t) given in (7.30) with the corresponding expansion of the transmitted 
signal Siit) given in (7.4): the expansion of (7.4), pertaining to the transmitter, is entirely 
deterministic; on the other hand, the expansion of (7.30) is random (stochastic) due to the 
channel noise at the receiver input. 


We now wish to develop a statistical characterization of the set of TV correlator outputs. Let 
X(t) denote the stochastic process, a sample function of which is represented by the 
received signal x(t). Correspondingly, let Xj denote the random variable whose sample 
value is represented by the correlator output Xj, j = 1,2,..., TV. According to the AWGN 
model of Figure 7.1, the stochastic process X(t) is a Gaussian process. It follows, 
therefore, that Xj is a Gaussian random variable for all j in accordance with Property 1 of a 
Gaussian process (Chapter 4). Hence, Xj is characterized completely by its mean and 
variance, which are determined next. 

Let W j denote the random variable represented by the sample value Wj produced by the 
jth correlator in response to the white Gaussian noise component wit). The random 
variable Wj has zero mean because the channel noise process Wit) represented by wit) in 
the AWGN model of Figure 7.1 has zero mean by definition. Consequently, the mean of Xj 
depends only on ,y ( y, as shown by 

Mx = E|W] 

J J 

= E [Sjj+Wji 
= s ij + ^ W jl 
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To find the variance of Xj, we start with the definition 

2 

a Xj = var[Xy] 

= mxj-s/] 

= Uwj] 

where the last line follows from (7.25) with Xj and Wj replaced by Xj and Wj, respectively. 
According to (7.27), the random variable Wj is defined by 

r T 

W: = w(t)Ht) d t 
J o 

We may therefore expand (7.32) as 

T T 

cj x = E[ W(t)<j>j(t ) dr[ W(u)</>j(u) du 
■> J o J J o 1 

T T 

= eT f f" t/)it)<l).{u)W{t)W{a) dt du 

L J o J o 1 J 


Interchanging the order of integration and expectation, which we can do because they are 
both linear operations, we obtain 

T T 

cf x = f f ^ (f)^,(M)E[W(f)VT(M)] dr du 
j J () J 0 J J 

= 0 j (t)<f> j (u)R w (. t ,u)dtdu 

•Vo 

where R w (t,u ) is the autocorrelation function of the noise process W(t). Since this noise is 
stationary, R w (t,u ) depends only on the time difference 1 - u. Furthermore, since Wit) is 
white with a constant power spectral density Nq/2, we may express R w (t,u ) as 



Therefore, substituting (7.35) into (7.34) and then using the sifting property of the delta 
function <5(f), we get 


Y = tJ J </>j(t)</>j(u)S(t-u)dtdi 

j l J J 


'0"0 
J 


N or 2 
z J o 


Since the </>j(t) have unit energy, by definition, the expression for noise variance a 
reduces to 


xj 


2 



No 
2 ’ 


for all j 


This important result shows that all the correlator outputs, denoted by Xj with j = 1,2, ..., 
N, have a variance equal to the power spectral density Nq/2 of the noise process Wit). 


Conversion of the Continuous AWGN Channel into a Vector Channel 


335 


Moreover, since the basic functions <j>j(t) form an orthonormal set, Xj and X k are 
mutually uncorrelated, as shown by 

co\'[XjX k ] = mXj-jU x )(X k -[x Xk )] 

= E[ (X.- Sij )(X k -s ik )] 

= E[ WjW k \ 

T T 

= e|~[ W(t)<l>-(t) dr f W(u)</> k (u) dn 

L J o J J o 

= <tj(t)</ k (u)R w (t,u)dtdu 

J 0 J 0 

= -7rf f <i j (t)</> k (u)S( t -u) dt du 
z J 0 J 0 

N or T 

= -y f </>;(/) <j>k(u) d t 
z J o 

= 0, j*k 

Since the Xj are Gaussian random variables, (7.37) implies that they are also statistically 
independent in accordance with Property 4 of a Gaussian process (Chapter 4). 

Define the vector of N random variables 


X = 


X 


N 


whose elements are independent Gaussian random variables with mean values equal to Sjj 
and variances equal to Nq/2. Since the elements of the vector X are statistically 
independent, we may express the conditional probability density function of the vector X, 
given that the signal Sf(t) or the corresponding symbol »(,• was sent, as the product of the 
conditional probability density functions of its individual elements; that is, 


N 


/x( x l m t) = Y\fxf x j\ m i)’ 
7 = 1 


i = 1, 2, 


, M 


where the vector x and scalar Xj are sample values of the random vector X and random 
variable Xj , respectively. The vector x is called the observation vector, correspondingly, Xj 
is called an element of the observation vector. A channel that satisfies (7.39) is said to be a 
memoryless channel. 

Since each Xj is a Gaussian random variable with mean and variance N 0 /2, we have 


fx( x j\ m i) 

J J 1 


fcN o 


exp 


1 ( \ 2 ’ 
T/rV . 


j = 1, 2, N 
i = 1,2, ..., M 
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Therefore, substituting (7.40) into (7.39) yields 


/x( x K) 


, ,-N / 2 

(jiA'q) exp 


N 


N, 


1 


i = 1, 2, ...,M 


which completely characterizes the first term of (7.30). 

However, there remains the noise term w'{t) in (7.30) to be accounted for. Since the 
noise process W(t ) represented by wit) is Gaussian with zero mean, it follows that the noise 
process W'{t) represented by the sample function w'(t) is also a zero-mean Gaussian 
process. Finally, we note that any random variable W (t k ) , say, derived from the noise 
process W'{t) by sampling it at time t k , is in fact statistically independent of the random 
variable X '. that is to say: 


E [XjW(t k )] = 0, 


J j = 1,2 

\0<t k <T 


Since any random variable based on the remainder noise process W'(t ) is independent of 
the set of random variables {Xj} as well as the set of transmitted signals {sft)}, (7.42) 
states that the random variable W'(t k ) is irrelevant to the decision as to which particular 
signal was actually transmitted. In other words, the correlator outputs determined by the 
received signal x(t) are the only data that are useful for the decision-making process; 
therefore, they represent sufficient statistics for the problem at hand. By definition, 
sufficient statistics summarize the whole of the relevant information supplied by an 
observation vector. 

We may now summarize the results presented in this section by formulating the 
theorem of irrelevance: 


Putting this theorem into a mathematical context, we may say that the AWGN channel 
model of Figure 7.1a is equivalent to an N -dimensional vector channel described by the 
equation 

x = s ; - + w, i = 1,2, ..., M 

where the dimension N is the number of basis functions involved in formulating the signal 
vector Sj for all i. The individual components of the signal vector s ,• and the additive Gaussian 
noise vector w are defined by (7.5) and (7.27), respectively. The theorem of irrelevance and 
its mathematical description given in (7.43) are indeed basic to the understanding of the 
signal-detection problem as described next. Just as importantly, (7.43) may be viewed as the 
baseband version of the time-dependent received signal of (7.24). 


The conditional probability density functions /^xlm,), i =1,2,..., M, provide the very 
characterization of an AWGN channel. Their derivation leads to a functional dependence 
on the observation vector x given the transmitted message symbol nij. However, at the 
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receiver we have the exact opposite situation: we are given the observation vector x and 
the requirement is to estimate the message symbol m,- that is responsible for generating x. 
To emphasize this latter viewpoint, we follow Chapter 3 by introducing the idea of a 
likelihood function, denoted by /(m ( ) and defined by 


However, tt is important to recall from Chapter 3 that although Km-) and /^(x|m ( ) have 
exactly the same mathematical form, their individual meanings are quite different. 

In practice, we find it more convenient to work with the log-likelihood function, 
denoted by L(m ; ) and defined by 


where In denotes the natural logarithm. The log-likelihood function bears a one-to-one 
relationship to the likelihood function for two reasons: 

By definition, a probability density function is always nonnegative. It follows, 
therefore, that the likelihood function is likewise a nonnegative quantity. 

The logarithmic function is a monotonically increasing function of its argument. 

The use of (7.41) in (7.45) yields the log-likelihood function for an AWGN channel as 


where we have ignored the constant term -{NI2)\n{nNf) since it bears no relation 
whatsoever to the message symbol m r Recall that the .Vy, j = 1,2, . .., N, are the elements 
of the signal vector s ,• representing the message symbol m r With (7.46) at our disposal, we 
are now ready to address the basic receiver design problem. 

Optimum Receivers Using Coherent Detection 


Suppose that, in each time slot of duration T seconds, one of the M possible signals sft), 
S 2 (t), .... s M (t) is transmitted with equal probability, 1/M. For geometric signal representa- 
tion, the signal sft), i = 1,2, . . ., M, is applied to a bank of correlators with a common input 
and supplied with an appropriate set of N orthonormal basis functions, as depicted in Figure 
7.2b. The resulting correlator outputs define the signal vector s,-. Since knowledge of the 
signal vector s ,• is as good as knowing the transmitted signal sft) itself, and vice versa, we 
may represent sft) by a point in a Euclidean space of dimension N < M. We refer to this 

point as the transmitted signal point, or message point for short. The set of message points 

M 

corresponding to the set of transmitted signals { sft) } . _ { is called a message constellation. 

However, representation of the received signal x(t) is complicated by the presence of 
additive noise w(t). We note that when the received signal x(t ) is applied to the bank of N 
correlators, the correlator outputs define the observation vector x. According to (7.43), the 
vector x differs from the signal vector s,- by the noise vector w, whose orientation is 
completely random, as it should be. 


Umf = f x (x\m j ), i = 1,2 


L(m.) = ln/(m-), i = 1,2 



i = 1,2, ..., M 
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The noise vector w is completely characterized by the channel noise vv(f); the converse 
of this statement, however, is not true, as explained previously. The noise vector w 
represents that portion of the noise w(t) that will interfere with the detection process; the 
remaining portion of this noise, denoted by w'(t ) , is tuned out by the bank of correlators 
and, therefore, irrelevant. 

Based on the observation vector x, we may represent the received signal x(t ) by a point 
in the same Euclidean space used to represent the transmitted signal. We refer to this 
second point as the received signal point. Owing to the presence of noise, the received 
signal point wanders about the message point in a completely random fashion, in the sense 
that it may lie anywhere inside a Gaussian-distributed “cloud” centered on the message 
point. This is illustrated in Figure 7.6a for the case of a three-dimensional signal space. 
For a particular realization of the noise vector w (i.e., a particular point inside the random 
cloud of Figure 7.6a) the relationship between the observation vector x and the signal 
vector Sj is as illustrated in Figure 7.6b. 

We are now ready to state the signal-detection problem: 


Given the observation vector x, suppose that we make the decision m = m . . The 
probability of error in this decision, which we denote by P e (m,-|x), is simply 

P e (»? ; .|x) = 1 - P(m i sent|x) 

The requirement is to minimize the average probability of error in mapping each given 
observation vector x into a decision. On the basis of (7.47), we may, therefore, state the 
optimum decision rule : 



Illustrating the effect of (a) noise perturbation on (b) the location of the received 
signal point. 
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The decision rule described in (7.48) is referred to as the maximum a posteriori probability 
(MAP) rule. Correspondingly, the system used to implement this rule is called a maximum 
a posteriori decoder. 

The requirement of (7.48) may be expressed more explicitly in terms of the prior 
probabilities of the transmitted signals and the likelihood functions, using Bayes’ rule 
discussed in Chapter 3. For the moment, ignoring possible ties in the decision-making 
process, we may restate the MAP rule as follows: 


In (7.49), we now note the following points: 

• the denominator term /x(x) is independent of the transmitted symbol; 

• the prior probability = n ,• when all the source symbols are transmitted with equal 
probability; and 

• the conditional probability density function f-^('n\m^) bears a one-to-one relationship 
to the log-likelihood function L(m k ). 

Accordingly, we may simply restate the decision rule of (7.49) in terms of L(m/ : ) as 
follows: 


The decision rule of (7.50) is known as the maximum likelihood rule , discussed previously 
in Chapter 3; the system used for its implementation is correspondingly referred to as the 
maximum likelihood decoder. According to this decision rule, a maximum likelihood 
decoder computes the log-likelihood functions as metrics for all the M possible message 
symbols, compares them, and then decides in favor of the maximum. Thus, the maximum 
likelihood decoder is a simplified version of the maximum a posteriori decoder, in that the 
M message symbols are assumed to be equally likely. 

It is useful to have a graphical interpretation of the maximum likelihood decision rule. 
Let Z denote the /V-dimensional space of all possible observation vectors x. We refer to 
this space as the observation space. Because we have assumed that the decision rule must 
say m = nr . where i = 1, 2, ..., M, the total observation space Z is correspondingly 
partitioned into M-decision regions, denoted by Zj, Z 2 , .... Z M . Accordingly, we may 
restate the decision rule of (7.50) as 


Aside from the boundaries between the decision regions Zl, Z 2 , •••» it is clear that this 
set of regions covers the entire observation space. We now adopt the convention that all 
ties are resolved at random; that is, the receiver simply makes a random guess. 
Specifically, if the observation vector x falls on the boundary between any two decision 
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regions, Z ( and Z k , say, the choice between the two possible decisions m = /« ( . and 
m = m k is resolved a priori by the flip of a fair coin. Clearly, the outcome of such an 
event does not affect the ultimate value of the probability of error since, on this boundary, 
the condition of (7.48) is satisfied with the equality sign. 

The maximum likelihood decision rule of (7.50) or its geometric counterpart described 
in (7.51) assumes that the channel noise w(t) is additive. We next specialize this rule for 
the case when w(t) is both white and Gaussian. 

From the log-likelihood function defined in (7.46) for an AWGN channel, we note that 



7 = l 

the choice k = i. Accordingly, we may formulate the maximum likelihood decision rule for 


an AW GN channel as 


Note we have used “minimum” as the optimizing condition in (7.52) because the minus 
sign in (7.46) has been ignored. Next, we note from the discussion presented in Section 
7.2 that 

N 2 

= i x - s A'ir 

j = i 

where ||x - s^J is the Euclidean distance between the observation vector x at the receiver 
input and the transmitted signal vector s /{ . Accordingly, we may restate the decision rule of 
(7.53) as 


In words, (7.54) states that the maximum likelihood decision rule is simply to choose the 
message point closest to the received signal point, which is intuitively satisfying. 

In practice, the decision rule of (7.54) is simplified by expanding the summation on the 
left-hand side of (7.53) as 

N o N N N 

X (x j - V = X x j - 2 X x fkj + X ' a / 

j = 1 j = 1 7=1 7=1 


The first summation term of this expansion is independent of the index k pertaining to the 
transmitted signal vector s k and, therefore, may be ignored. The second summation term is 
the inner product of the observation vector x and the transmitted signal vector s^. The third 
summation term is the transmitted signal energy 
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Illustrating the partitioning of the 
observation space into decision regions 
for the case when N =2 and M = 4; it is 
assumed that the M transmitted symbols 
are equally likely. 
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Accordingly, we may reformulate the maximum-likelihood decision rule one last time: 


From (7.57) we infer that, for an AWGN channel, the M decision regions are bounded by 
linear hyperplane boundaries. The example in Figure 7.7 illustrates this statement for 
M = 4 signals and N =2 dimensions, assuming that the signals are transmitted with equal 
energy E and equal probability. 


In light of the material just presented, the optimum receiver for an AWGN channel and for 
the case when the transmitted signals Si(t), ,v 2 (f), ..., s M (t) are equally likely is called a 
correlation receiver, it consists of two subsystems, which are detailed in Figure 7.8: 

Detector (Figure 7.8a), which consists of M correlators supplied with a set of 
orthonormal basis functions (j>\{t), fait), ..., that are generated locally; this 
bank of correlators operates on the received signal x(t), 0 < t < T, to produce the 
observation vector x. 

Maximum-likelihood decoder (Figure 7.8b), which operates on the observation 
vector x to produce an estimate m of the transmitted symbol m r i = 1,2, . .., M, in 
such a way that the average probability of symbol error is minimized. 

In accordance with the maximum likelihood decision rule of (7.57), the decoder multiplies 
the N elements of the observation vector x by the corresponding N elements of each of the 
M signal vectors Sj, s 2 , ..., s M . Then, the resulting products are successively summed in 
accumulators to form the corresponding set of inner products {x T s^.|k = 1, 2, ..., M}. 
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(a) Detector or demodulator, (b) Signal 
transmission decoder. 





Inner-product calculator 



(b) 


Estimate 

m 


Next, the inner products are corrected for the fact that the transmitted signal energies may 
be unequal. Finally, the largest one in the resulting set of numbers is selected, and an 
appropriate decision on the transmitted message is thereby made. 


The detector shown in Figure 7.8a involves a set of correlators. Alternatively, we may use 
a different but equivalent structure in place of the correlators. To explore this alternative 
method of implementing the optimum receiver, consider a linear time-invariant filter with 
impulse response hj(t). With the received signal x(t) operating as input, the resulting filter 
output is defined by the convolution integral 
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f °° 

yft) = J x( T)hj(t - r) dr 


To proceed further, we evaluate this integral over the duration of a transmitted symbol, 
namely 0 <t<T. With time t restricted in this manner, we may replace the variable r with 
t and go on to write 

yXT) = f x{t)h.(T-t) dt 
J 0 

Consider next a detector based on a bank of correlators. The output of the /th correlator is 
defined by the first line of (7.25), reproduced here for convenience of representation: 


x j 



For yj(T) to equal Xp we find from (7.58) and (7.59) that this condition is satisfied provided 
that we choose 

hj(T - 1) = 0j(t) for 0 <t<T and j=l,2,...,M 

Equivalently, we may express the condition imposed on the desired impulse response of 
the filter as 

hj(t) = <f>j(T-t), for 0 <t<T and j=\,2,...,M 
We may now generalize the condition described in (7.60) by stating: 


A time-invariant filter defined in this way is called a matched filter. Correspondingly, an 
optimum receiver using matched filters in place of correlators is called a matched-filter 
receiver. Such a receiver is depicted in Figure 7.9, shown below. 


Received 

signal 

x(t) 



Sample 


at t = T 


Observation 

vector 

x 


Detector part of matched 
filter receiver; the signal transmission 
decoder is as shown in Figure 7.8(b). 


Matched 

filters 
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Probability of Error 


To complete the statistical characterization of the correlation receiver of Figure 7.8a or its 

equivalent, the matched filter receiver of Figure 7.9, we need to evaluate its performance 

in the presence of AWGN. To do so, suppose that the observation space Z is partitioned 
M 

into a set of regions, { Z, 1 . ,, in accordance with the maximum likelihood decision rule. 

1 l = l 

Suppose also that symbol m ,• (or, equivalently, signal vector s ( ) is transmitted and an 
observation vector x is received. Then, an error occurs whenever the received signal point 
represented by x does not fall inside region Z,- associated with the message point s r 
Averaging over all possible transmitted symbols assumed to be equiprobable, we see that 
the average probability of symbol error is 
M 

P e = ^ ;r ■ P ( x does not lie in Z ( -| m i sent) 
i = 1 


M 


= — V 1 P(x does not lie in Z-lm , sent), tr- = 1/M 
ML ! l 1 1 


1 


M 


1 - M S ^ es * n Zj\ m j sent ) 
i = 1 


where we have used the standard notation to denote the conditional probability of an 
event. Since x is the sample value of random vector X, we may rewrite (7.62) in terms of 
the likelihood function as follows, given that the message symbol is sent: 


P 


e 




For an /V-dimensional observation vector, the integral in (7.63) is likewise /V-dimensional. 


There is a uniqueness to the way in which the observation space Z is partitioned into the 
set of regions Z|, Z 2 , ..., Z M in accordance with the maximum likelihood detection of a 
signal in AWGN; that uniqueness is defined by the message constellation under study. In 
particular, we may make the statement: 


This statement embodies the invariance property of the average probability of symbol 
error P e with respect to notation and translation, which is the result of two facts: 

In maximum likelihood detection, the probability of symbol error P e depends solely 
on the relative Euclidean distance between a received signal point and message point 
in the constellation. 

The AWGN is spherically symmetric in all directions in the signal space. 
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To elaborate, consider first the invariance of P e with respect to rotation. The effect of a 
rotation applied to all the message points in a constellation is equivalent to multiplying the 
/V-dimensional signal vector s, by an N-by-N orthonormal matrix denoted by Q for all i. 
By definition, the matrix Q satisfies the condition 

T 

QQ = I 

where the superscript T denotes matrix transposition and I is the identity matrix whose 
diagonal elements are all unity and its off-diagonal elements are all zero. According to 
(7.64), the inverse of the real-valued orthonormal matrix Q is equal to its own transpose. 
Thus, in dealing with rotation, the message vector s ( - is replaced by its rotated version 

S (, rotate — Q S ;’ , _ 1> 2 

Correspondingly, the /V-by- 1 noise vector w is replaced by its rotated version 

rotate = Q W 

However, the statistical characteristics of the noise vector are unaffected by this rotation 
for three reasons: 


From Chapter 4 we recall that a linear combination of Gaussian random variables is 
also Gaussian. Since the noise vector w is Gaussian, by assumption, then it follows 
that the rotated noise vector w rotate is also Gaussian. 

Since the noise vector w has zero mean, the rotated noise vector w rotate also has zero 
mean, as shown by 

^rotate] = E [Q W 1 
= QE[w] 

= 0 

The covariance matrix of the noise vector w is equal to (Nq/2)\, where Nq/2 is the 
power spectral density of the AWGN w(t) and I is the identity matrix; that is 

rr r T ^o T 

lh[ ww ] = —i 

Hence, the covariance matrix of the rotated noise vector is 


E[w 


e ] = E[Qw(Qw) ] 
E[Qww T Q T ] 

QE[ww T ]Q T 
AG j 

y°QQ 

iVn 


where, in the last two lines, we have made use of (7.68) and (7.64). 

In light of these three reasons, we may, therefore, express the observation vector in the 
rotated message constellation as 

X rotate = Q s ,‘ + i=l,2,...,M 
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Using (7.65) and (7.70), we may now express the Euclidean distance between the rotated 
vectors x rotate and s rotate as 


|Qs ( + w-Qs,| 

|w|| 

Ix-sJI, i = 1, 2, ...,M 


where, in the last line, we made use of (7.43). 

We may, therefore, formally state the principle of rotational invariance: 


Illustration of Rotational Invariance 

To illustrate the principle of rotational invariance, consider the signal constellation shown 
in Figure 7.10a. The constellation is the same as that of Figure 7.10b, except for the fact 
that it has been rotated through 45°. Although these two constellations do indeed look 
different in a geometric sense, the principle of rotational invariance teaches us 
immediately that the P e is the same for both of them. 
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(a) (b) 

A pair of signal constellations for illustrating the principle 
of rotational invariance. 


Consider next the invariance of P e to translation. Suppose all the message points in a 
signal constellation are translated by a constant vector amount a, as shown by 

i = 1,2, ..., M 


S i, translate S ; 
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The observation vector is correspondingly translated by the same vector amount, as shown 
by 

X translate = x-a 

From (7.72) and (7.73) we see that the translation a is common to both the translated signal 
vector Sj and translated observation vector x. We, therefore, immediately deduce that 

|| X translate “ S i, translate! = | X " S i\\’ for i = 1,2, ...,M 
and thus formulate the principle of translational invariance: 


Translation of Signal Constellation 

As an example, consider the two signal constellations shown in Figure 7.11, which pertain 
to a pair of different four-level PAM signals. The constellation of Figure 7.1 lb is the same 
as that of Figure 7.11a, except for a translation 3 a/2 to the right along the -axis. The 
principle of translational invariance teaches us that the P e is the same for both of these 
signal constellations. 


-3 a/2 


-a/2 


a/2 


• <f>i 

3 a/2 
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2a 
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(a) 


(b) 

A pair of signal constellations for illustrating the principle of translational invariance. 


For AWGN channels, the formulation of the average probability of symbol error If is 
conceptually straightforward, in that we simply substitute (7.41) into (7.63). 
Unfortunately, however, numerical computation of the integral so obtained is impractical, 
except in a few simple (nevertheless, important) cases. To overcome this computational 
difficulty, we may resort to the use of bounds, which are usually adequate to predict the 
SNR (within a decibel or so) required to maintain a prescribed error rate. The 
approximation to the integral defining P e is made by simplifying the integral or 
simplifying the region of integration. In the following, we use the latter procedure to 
develop a simple yet useful upper bound, called the union bound, as an approximation to 
the average probability of symbol error for a set of M equally likely signals (symbols) in 
an AWGN channel. 

Let A [k , with (i,k) = 1,2, ..., M, denote the event that the observation vector x is closer 
to the signal vector s^. than to s,-, when the symbol m ( - (message vector s ( ) is sent. The 
conditional probability of symbol error when symbol m i is sent, PJmf, is equal to the 
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probability of the union of events, defined by the set {A jk } k _ . Probability theory 

k&i 

teaches us that the probability of a finite union of events is overbounded by the sum of the 
probabilities of the constituent events. We may, therefore, write 

M 

P e (m,)< £ P(A ik ), i = 1, 2, ..., M 

k = 1 
k & i 


Constellation of Four Message Points 

To illustrate applicability of the union bound, consider Figure 7.12 for the case of M = 4. 
Figure 7.12a shows the four message points and associated decision regions, with the 
point S| assumed to represent a transmitted symbol. Figure 7.12b shows the three 
constituent signal-space descriptions where, in each case, the transmitted message point Sj 
and one other message point are retained. According to Figure 7.12a the conditional 
probability of symbol error, Pfmf), is equal to the probability that the observation vector x 


\ Sl 

\ 

\ 

\ 

\ 

\ 

— • 5 

s 2 / 

/ 

/ • 

/ X 


/ 

\ " 

\ 

\ 

\ 

\ 


\ 


(a) 



02 




(b) 

Illustrating the union bound, (a) Constellation of four message points, (b) Three 
constellations with a common message point and one other message point x retained from the 
original constellation. 
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lies in the shaded region of the two-dimensional signal-space diagram. Clearly, this 
probability is less than the sum of the probabilities of the three individual events that x lies 
in the shaded regions of the three constituent signal spaces depicted in Figure 7.12b. 


It is important to note that, in general, the probability P (A ik ) is different from the 
probability P(m = m^nij) , which is the probability that the observation vector x is closer 
to the signal vector (i.e., symbol m^) than every other when the vector s ; - (i.e., symbol 
t?ij) is sent. On the other hand, the probability P(A (jt ) depends on only two signal vectors, 
Sj and Sjt. To emphasize this difference, we rewrite (7.75) by adopting p ik in place of 
P(A ;/t ). We thus write 


The probability p ik is called the pairwise error probability, in that if a digital 
communication system uses only a pair of signals, s,- and s^,, then p ik is the probability of 
the receiver mistaking s k for s ( -. 

Consider then a simplified digital communication system that involves the use of two 
equally likely messages represented by the vectors s, and s^. Since white Gaussian noise is 
identically distributed along any set of orthogonal axes, we may temporarily choose the 
first axis in such a set as one that passes through the points s,- and s /; ; for three illustrative 
examples, see Figure 7.12b. The corresponding decision boundary is represented by the 
bisector that is perpendicular to the line joining the points s, and S/,. Accordingly, when the 
vector Sj (i.e., symbol m t ) is sent, and if the observation vector x lies on the side of the 
bisector where S/. lies, an error is made. The probability of this event is given by 


where d lk in the lower limit of the integral is the Euclidean distance between signal vectors 
Sj and s^; that is, 


To change the integral of (7.77) into a standard form, define a new integration variable 


M 



i = 1, 2, M 


k = 1 
k*i 


p ik = P(x is closer to s^.than s ( -, when s ( - is sent) 



d ik - | S i~ S A:| 



Equation (7.77) is then rewritten in the desired form 
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The integral in (7.80) is the (7-function of (3.68) that was introduced in Chapter 3. In terms 
of the (7-function, we may now express the probability p jk in the compact form 


Pik = Q 


r Jik_ 

Mo 


Correspondingly, substituting (7.81) into (7.76), we write 


M 


p e ( m i x Q 


k = 1 
k*i 


a ik 

Mo ' 


i = 1,2, ...,M 


The probability of symbol error, averaged over all the M symbols, is, therefore, over- 
bounded as follows: 

M 

P e = X V’eK) 

i = 1 


"I 


i = 1 



k*i 


where is the probability of sending symbol m k 

There are two special forms of (7.83) that are noteworthy: 

Suppose that the signal constellation is circularly symmetric about the origin. Then, 
the conditional probability of error P e (m ; ) is the same for all i, in which case (7.83) 
reduces to 


M 

,< £ a 

k = 1 
k*i 


l ik 


Mo 


for all i 


Figure 7.10 illustrates two examples of circularly symmetric signal constellations. 
Define the minimum distance of a signal constellation i/ mm as the smallest Euclidean 
distance between any two transmitted signal points in the constellation, as shown by 


^min = min d ik for a11 1 and k 

k*i 

Then, recognizing that the (7-function is a monotonically decreasing function of its 
argument, we have 




for all i and k 


Therefore, in general, we may simplify the bound on the average probability of 
symbol error in (7.83) as 


P <{M-\)Q 


^min 

Mo 
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The ()-function in (7.87) is itself upper bounded as 



Accordingly, we may further simplify the bound on P e in (7.87) as 



In words, (7.89) states the following: 


Thus far, the only figure of merit we have used to assess the noise performance of a digital 
communication system in AWGN has been the average probability of symbol (word) 
error. This figure of merit is the natural choice when messages of length m = log 2 M are 
transmitted, such as alphanumeric symbols. However, when the requirement is to transmit 
binary data such as digital computer data, it is often more meaningful to use another figure 
of merit called the BER. Although, in general, there are no unique relationships between 
these two figures of merit, it is fortunate that such relationships can be derived for two 
cases of practical interest, as discussed next. 


Suppose that it is possible to perform the mapping from binary to M - ary symbols in such a 
way that the two binary M-tuples corresponding to any pair of adjacent symbols in the M - ary 
modulation scheme differ in only one bit position. This mapping constraint is satisfied by 
using a Gray code. When the probability of symbol error P e is acceptably small, we find that 
the probability of mistaking one symbol for either one of the two “nearest” symbols is 
greater than any other kind of symbol error. Moreover, given a symbol error, the most 
probable number of bit errors is one, subject to the aforementioned mapping constraint. 
Since there are log 2 Mbits per symbol, it follows that the average probability of symbol error 
is related to the BER as follows: 


= log 2 M - (BER) 

where, in the first line, u is the symbol for “union” as used in set theory. We also note that 
P e > P>(i'th bit is in error) = BER 


M-tuples Differing in Only a Single Bit 


log 2 M 

P e = P*( kJ { ith bit is in error} ) 


i = 1 


log 2 M 



i = 1 
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It follows, therefore, that the BER is bounded as follows: 


P 


e 


log 2 M 


< BER < P e 


Number of Symbols Equal to Integer Power of 2 

K 

Suppose next M = 2 , where K is an integer. We assume that all symbol errors are equally 
likely and occur with probability 

J\_ = _A_ 

M~ 1 2*-l 


where P e is the average probability of symbol error. To find the probability that the z'th bit 

K — \ r- 

in a symbol is in error, we note that there are 2 cases of symbol error in which this 
particular bit is changed and there are 2 cases in which it is not. Hence, the BER is 


or, equivalently, 


BER 


r 2 K ~ * l 
k2*-l rC 


BER 


M/2 

M - 1 ' e 


Note that, for large M, the BER approaches the limiting value of PJ2. Note also that the 
bit errors are not independent in general. 


Phase-Shift Keying Techniques Using Coherent Detection 


With the background material on the coherent detection of signals in AWGN presented in 
Sections 7. 2-7. 4 at our disposal, we are now ready to study specific passband data- 
transmission systems. In this section, we focus on the family of phase-shift keying (PSK) 
techniques, starting with the simplest member of the family discussed next. 


In a binary PSK system , the pair of signals sft) and s 2 (t) used to represent binary symbols 

1 and 0, respectively, is defined by 


■SjM = /— cos(2tt/ c 0, 

7 b 


0 <t<T u 


s 2 (t) = I — 2 CO s(27t / ' t + n) = - | — -cos (27t/ t), 0 <f<77 


where T b is the bit duration and E ^ is the transmitted signal energy per bit. We find it con- 
venient, although not necessary, to assume that each transmitted bit contains an integral 
number of cycles of the carrier wave; that is, the carrier frequency / c is chosen equal to 
n c IT b for some fixed integer n c . A pair of sinusoidal waves that differ only in a relative 
phase-shift of 180°, defined in (7.95) and (7.96), is referred to as an antipodal signal. 
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Signal-Space Diagram of Binary PSK Signals 

From this pair of equations it is clear that, in the case of binary PSK, there is only one 
basis function of unit energy: 

= J|-cos(27t/ c 0, o<f<r b 

Then, we may respectively express the transmitted signals ^(f) and sff) in terms of as 


■*l(0 = jEJft), 0<t<T b 

s 2(0 = o<t<r b 


A binary PSK system is, therefore, characterized by having a signal space that is 
one-dimensional (i.e., N = 1), with a signal constellation consisting of two message points 
(i.e., M = 2). The respective coordinates of the two message points are 

J’b 

*11 = [ $1 ( 0^(0 dt 

J o 



f T b 

^21 = s 2 {t)^{t)dt 

J o 

= -JK 

In words, the message point corresponding to Si(t) is located at jjj = +Je^ and the 
message point corresponding to s 2 (t) is located at = -jE~ b ■ Figure 7.13a displays the 


(a) Signal-space diagram 
for coherent binary 
PSK system, (b) The 
waveforms depicting 
the transmitted signals 
5-j(r) and s 2 (t), assuming 
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signal-space diagram for binary PSK and Figure 7.13b shows example waveforms of 
antipodal signals representing sft) and s 2 (t). Note that the binary constellation of Figure 
7.13 has minimum average energy. 

Generation of a binary PSK signal follows readily from (7.97) to (7.99). Specifically, as 
shown in the block diagram of Figure 7.14a, the generator (transmitter) consists of two 
components: 

Polar NRZ-level encoder, which represents symbols 1 and 0 of the incoming binary 
sequence by amplitude levels + jE~ b and -jE ~ b , respectively. 

Product modulator, which multiplies the output of the polar NRZ encoder by the 
basis function <j)fty, in effect, the sinusoidal (j>ft) acts as the “carrier” of the binary 
PSK signal. 

Accordingly, binary PSK may be viewed as a special form of DSB-SC modulation that 
was studied in Section 2.14. 

Error Probability of Binary PSK Using Coherent Detection 

To make an optimum decision on the received signal x{t) in favor of symbol 1 or symbol 0 
(i.e., estimate the original binary sequence at the transmitter input), we assume that the 
receiver has access to a locally generated replica of the basis function <f\(t). In other 
words, the receiver is synchronized with the transmitter, as shown in the block diagram of 
Figure 7.14b. We may identify two basic components in the binary PSK receiver: 

Correlator, which correlates the received signal x(t) with the basis function (fit) on 
a bit-by-bit basis. 

Decision device, which compares the correlator output against a zero-threshold, 
assuming that binary symbols 1 and 0 are equiprobable. If the threshold is exceeded, 
a decision is made in favor of symbol 1; if not, the decision is made in favor of 
symbol 0. Equality of the correlator with the zero-threshold is decided by the toss of 
a fair coin (i.e., in a random manner). 

With coherent detection in place, we may apply the decision rule of (7.54). Specifically, 
we partition the signal space of Figure 7.13 into two regions: 

• the set of points closest to message point 1 at + Jpf ; and 

• the set of points closest to message point 2 at -jE~ b ■ 

This is accomplished by constructing the midpoint of the line joining these two message 
points and then marking off the appropriate decision regions. In Figure 7.13, these two 
decision regions are marked Z| and Z 2 , according to the message point around which they 
are constructed. 

The decision rule is now simply to decide that signal sft) (i.e., binary symbol 1) was 
transmitted if the received signal point falls in region Z| and to decide that signal s 2 (t) 
(i.e., binary symbol 0) was transmitted if the received signal point falls in region Z 2 . Two 
kinds of erroneous decisions may, however, be made: 

Error of the first kind. Signal s 2 (t) is transmitted but the noise is such that the received 
signal point falls inside region Z] ; so the receiver decides in favor of signal sf t). 
Error of the second kind. Signal ,V](f) is transmitted but the noise is such that the 
received signal point falls inside region Z 2 ; so the receiver decides in favor of signal s 2 (t). 
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Choose 1 if x 1 > 0 
Choose 0 if jcj < 0 


Block diagrams for (a) binary PSK transmitter and (b) coherent 
binary PSK receiver. 


To calculate the probability of making an error of the first kind, we note from Figure 7.13a 
that the decision region associated with symbol 1 or signal ,V](r) is described by 

Zj : 0 < JCj < co 

where the observable element xj is related to the received signal x(t) by 


r b 

x l = x(r)0j(f) dr 

* A 


The conditional probability density function of random variable Xj, given that symbol 0 
(i.e., signal .s^Cf)) was transmitted, is defined by 


fx^) 


1 


exp 




Using (7.101) in this equation yields 


fxS x i\ 0 ^ ~ 


1 


Mo 


exp 


N„ (X < + ^ 


The conditional probability of the receiver deciding in favor of symbol 1, given that 
symbol 0 was transmitted, is therefore 


P 10 = 


1 


Mo 


exp 


4 ( -«, + far 


d.r, 


Putting 
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and changing the variable of integration from Vj to z, we may compactly rewrite (7.105) in 
terms of the 0-function: 


P 10 


— r 

V2jt } J2 


exp dj 


Using the formula of (3.68) in Chapter 3 for the 0-function in (7.107) we get 


f 

P\o = Q 

v 



Consider next an error of the second kind. We note that the signal space of Figure 7.13a is 
symmetric with respect to the origin. It follows, therefore, that p m , the conditional 
probability of the receiver deciding in favor of symbol 0, given that symbol 1 was 
transmitted, also has the same value as in (7.108). 

Thus, averaging the conditional error probabilities p l0 and ppi, we find that the average 
probability of symbol error or, equivalently, the BER for binary PSK using coherent 
detection and assuming equiprobable symbols is given by 



As we increase the transmitted signal energy per bit for a specified noise spectral 
density N 0 /2, the message points corresponding to symbols 1 and 0 move further apart and 
the average probability of error P e is correspondingly reduced in accordance with (7.109), 
which is intuitively satisfying. 


Power Spectra of Binary PSK Signals 

Examining (7.97) and (7.98), we see that a binary PSK wave is an example of DSB-SC 
modulation that was discussed in Section 2.14. More specifically, it consists of an in-phase 
component only. Let g(t) denote the underlying pulse-shaping function defined by 



. 0, otherwise 


Depending on whether the transmitter input is binary symbol 1 or 0, the corresponding 
transmitter output is +g(t ) or -g(t), respectively. It is assumed that the incoming binary 
sequence is random, with symbols 1 and 0 being equally likely and the symbols 
transmitted during the different time slots being statistically independent. 

In Example 6 of Chapter 4, it was shown that the power spectral density of a random 
binary wave so described is equal to the energy spectral density of the symbol shaping 
function divided by the symbol duration. The energy spectral density of a Fourier- 
transformable signal g(t ) is defined as the squared magnitude of the signal’s Fourier 
transform. For the binary PSK signal at hand, the baseband power spectral density is, 
therefore, defined by 
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2£ h sin 2 (jt7’ h /) 

S B (f) = — — 

( nT h f) 1 

= 2£ b sinc 2 (7 b /) 

Examining (7. 1 1 1), we may make the following observations on binary PSK: 

The power spectral density S B (f) is symmetric about the vertical axis, as expected. 
S B (f) goes through zero at multiples of the bit rate; that is ,/= ±l/7’ h , ±2/7 b , ... 

With sin~(nT b f) limited to a maximum value of unity, S B (f) falls off as the inverse 
square of the frequency, f. 

These three observations are all embodied in the plot of S B (f) versus f presented in Figure 7.15. 

Figure 7.15 also includes a plot of the baseband power spectral density of a binary 
frequency-shift keying (FSK) signal, details of which are presented in Section 7.8. 
Comparison of these two spectra is deferred to that section. 


The provision of reliable performance, exemplified by a very low probability of error, is 
one important goal in the design of a digital communication system. Another important 
goal is the efficient utilization of channel bandwidth. In this subsection we study a 
bandwidth-conserving modulation scheme known as quadriphase-shift keying (QPSK), 
using coherent detection. 

As with binary PSK, information about the message symbols in QPSK is contained in 
the carrier phase. In particular, the phase of the carrier takes on one of four equally spaced 



Power spectra of binary PSK and FSK signals. 
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values, such as ji/ 4, 37i/4, 5 tc/ 4, and 7 ji/ 4. For this set of values, we may define the trans- 
mitted signal as 


*/(0 



cos 


2nft+(2i- 


"5 


J 0 <t<T 
\ i = 1, 2, 3, 4 


0 , 


elsewhere 


where E is the transmitted signal energy per symbol and T is the symbol duration. The 
carrier frequency f c equals n c /T for some fixed integer n c . Each possible value of the phase 
corresponds to a unique dibit (i.e., pair of bits). Thus, for example, we may choose the 
foregoing set of phase values to represent the Gray-encoded set of dibits, 10, 00, 01, and 
1 1 , where only a single bit is changed from one dibit to the next. 


Signal-Space Diagram of QPSK Signals 

Using a well-known trigonometric identity, we may expand (7.112) to redefine the 
transmitted signal in the canonical form: 


,,, 2 E 

sft) = J— cos 


( 21 - 1 ) 


b f 

cos(2nf t)~ / — sin 


( 21 - 1 ) 


sin(2n/ r) 


where 1 = 1, 2, 3, 4. Based on this representation, we make two observations: 

There are two orthonormal basis functions, defined by a pair of quadrature carriers: 


= J|cos(2it/ C 0, 0 

= Jf sin ( 27: f c t), 


< t< T 


0 < t<T 


Signal-space diagram of 
QPSK system. 
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Signal-space characterization of QPSK 


11 

it/4 

+JI72 

+JEV2 

01 

3tc/4 

-JeTi 

+JE/2 

00 

5ti/4 

-JeTi 

-JeTt. 

10 

7n/4 

+JeT2 

-JeTi 


There are four message points, defined by the two-dimensional signal vector 


VEcos((2;'- 1)^) 
-Je sin ^(21 - 1 )^J 


i = 1, 2, 3, 4 


Elements of the signal vectors, namely and sq, have their values summarized in 
Table 7.2; the first two columns give the associated dibit and phase of the QPSK signal. 

Accordingly, a QPSK signal has a two-dimensional signal constellation (i.e., N = 2) and 
four message points (i.e., M = 4) whose phase angles increase in a counterclockwise 
direction, as illustrated in Figure 7. 16. As with binary PSK, the QPSK signal has minimum 
average energy. 


QPSK Waveforms 

Figure 7.17 illustrates the sequences and waveforms involved in the generation of a QPSK 
signal. The input binary sequence 01101000 is shown in Figure 7.17a. This sequence is 
divided into two other sequences, consisting of odd- and even-numbered bits of the input 
sequence. These two sequences are shown in the top lines of Figure 7.17b and c. The 
waveforms representing the two components of the QPSK signal, namely and 

Sj 2 </h(t) are also shown in Figure 7.17b and c, respectively. These two waveforms may 
individually be viewed as examples of a binary PSK signal. Adding them, we get the 
QPSK waveform shown in Figure 7.17d. 

To define the decision rule for the coherent detection of the transmitted data sequence, 
we partition the signal space into four regions, in accordance with Table 7.2. The 
individual regions are defined by the set of symbols closest to the message point 
represented by message vectors Sj, s 2 , S3, and S4. This is readily accomplished by 
constructing the perpendicular bisectors of the square formed by joining the four message 
points and then marking off the appropriate regions. We thus find that the decision regions 
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Input 

binary 

sequence 


0 1 
Dibit 01 


1 0 
Dibit 10 

(a) 


10 0 0 
Dibit 10 Dibit 00 


Odd-numbered sequence 0 1 

Polarity of coefficient s ;i - + 


1 0 


s ii0iM 



(b) 


Even-numbered sequence 10 0 0 

Polarity of coefficient s i2 + - - - 


s i2 <t> 2 U ) 



(C) 



(d) 

(a) Input binary sequence, (b) Odd-numbered dibits of input sequence and associated 
binary PSK signal, (c) Even-numbered dibits of input sequence and associated binary PSK signal, 
(d) QPSK waveform defined as s(t) = .s,| t/> y(t) + s i2 <j> 2 (0- 


are quadrants whose vertices coincide with the origin. These regions are marked Zj, Z 2 , 
Z 3 , and Z 4 in Figure 7.17, according to the message point around which they are 
constructed. 


Generation and Coherent Detection of QPSK Signals 

Expanding on the binary PSK transmitter of Figure 7.14a, we may build on (7.113) to 
(7.1 15) to construct the QPSK transmitter shown in Figure 7.18a. A distinguishing feature 
of the QPSK transmitter is the block labeled demultiplexer. The function of the 
demultiplexer is to divide the binary wave produced by the polar NRZ-level encoder into 
two separate binary waves, one of which represents the odd-numbered dibits in the 
incoming binary sequence and the other represents the even-numbered dibits. 
Accordingly, we may make the following statement: 
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QPSK 

signal 

s(r) 


<f> 2 (r) = V 2/T sin(2jc/ c f) 
(a) 


Threshold = 0 


Received 
signal ■ 

Jf(f) 



Estimate of 
transmitted binary 
sequence 


Quadrature channel 
(b) 

Block diagram of (a) QPSK transmitter and (b) coherent QPSK receiver. 


Expanding on the binary PSK receiver of Figure 7.14b, we find that the QPSK receiver is 
structured in the form of an in-phase path and a quadrature path, working in parallel as 
depicted in Figure 7.18b. The functional composition of the QPSK receiver is as follows: 

Pair of correlators, which have a common input x(f). The two correlators are 
supplied with a pair of locally generated orthonormal basis functions <j>\{t) and <f l 2 (f), 
which means that the receiver is synchronized with the transmitter. The correlator 
outputs, produced in response to the received signal x(t), are denoted by Xj and x 2 , 
respectively. 

Pair of decision devices, which act on the correlator outputs xj and x 2 by comparing 
each one with a zero-threshold; here, it is assumed that the symbols 1 and 0 in the 
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original binary stream at the transmitter input are equally likely. If x } > 0, a decision 
is made in favor of symbol 1 for the in-phase channel output; on the other hand, if 
X] < 0, then a decision is made in favor of symbol 0. Similar binary decisions are 
made for the quadrature channel. 

Multiplexer, the function of which is to combine the two binary sequences produced 
by the pair of decision devices. The resulting binary sequence so produced provides 
an estimate of the original binary stream at the transmitter input. 


Error Probability ofQPSK 

In a QPSK system operating on an AWGN channel, the received signal x(t) is defined by 


x(t) = s -(t) + w(t). 


J 0 <t<T 
I i=l, 2, 3, 4 


where w(t) is the sample function of a white Gaussian noise process of zero mean and 
power spectral density N 0 /2. 

Referring to Figure 7.18a, we see that the two correlator outputs, xq and x 2 , are 
respectively defined as follows: 

r T 

x'j = x(t )i/> l (t) dr 

J o 


= jEc 


(2 i-l) 


+ W J 



and 

t T 

x 2 = x(t)</> 2 (t) dr 

^ n 


Je sin 


( 2 /- 




+ 



Thus, the observable elements xj and x 2 are sample values of independent Gaussian 
random variables with mean values equal to ±J~E/ 2 and +JE / 2 , respectively, and with 
a common variance equal to Nq/2. 

The decision rule is now simply to say that sj(r) was transmitted if the received signal 
point associated with the observation vector x falls inside region Zj; say that .v 2 (r) was 
transmitted if the received signal point falls inside region Z 2 , and so on for the other two 
regions Z 3 and Z 4 . An erroneous decision will be made if, for example, signal 54(f) is 
transmitted but the noise w{t) is such that the received signal point falls outside region Z 4 . 

To calculate the average probability of symbol error, recall that a QPSK receiver is in 
fact equivalent to two binary PSK receivers working in parallel and using two carriers that 
are in phase quadrature. The in-phase channel xq and the quadrature channel output x 2 
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(i.e., the two elements of the observation vector x) may be viewed as the individual 
outputs of two binary PSK receivers. Thus, according to (7.118) and (7.119), these two 
binary PSK receivers are characterized as follows: 

• signal energy per bit equal to El 2, and 

• noise spectral density equal to Nq/2. 

Hence, using (7.109) for the average probability of bit error of a coherent binary PSK 
receiver, we may express the average probability of bit error in the in-phase and 
quadrature paths of the coherent QPSK receiver as 

p' = e( 

where E is written in place of 2 E b . Another important point to note is that the bit errors in 
the in-phase and quadrature paths of the QPSK receiver are statistically independent. The 
decision device in the in-phase path accounts for one of the two bits constituting a symbol 
(dibit) of the QPSK signal, and the decision device in the quadrature path takes care of the 
other dibit. Accordingly, the average probability of a correct detection resulting from the 
combined action of the two channels (paths) working together is 




The average probability of symbol error for QPSK is therefore 



In the region where ( E/Nq ) » 1 , we may ignore the quadratic term on the right-hand side of 
(7.122), so the average probability of symbol error for the QPSK receiver is approximated as 



Equation (7.123) may also be derived in another insightful way, using the signal-space 
diagram of Figure 7.16. Since the four message points of this diagram are circularly 
symmetric with respect to the origin, we may apply the approximate formula of (7.85) 
based on the union bound. Consider, for example, message point m | (corresponding to 
dibit 10) chosen as the transmitted message point. The message points m 2 and m 4 
(corresponding to dibits 00 and 1 1) are the closest to mj. From Figure 7.16 we readily find 
that nil i s equidistant from m 2 and m 4 in a Euclidean sense, as shown by 

d jo — ^/j 4 — — E 


Assuming that E/Nq is large enough to ignore the contribution of the most distant message 
point ?« 3 (corresponding to dibit 01) relative to m j, we find that the use of (7.85) with the 
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equality sign yields an approximate expression for P e that is the same as that of (7.123). 
Note that in mistaking either m 2 or m 4 for m \ , a single bit error is made; on the other hand, 
in mistaking m 3 for m j , two bit errors are made. For a high enough E/N 0 , the likelihood of 
both bits of a symbol being in error is much less than a single bit, which is a further 
justification for ignoring m 3 in calculating P e when is sent. 

In a QPSK system, we note that since there are two bits per symbol, the transmitted 
signal energy per symbol is twice the signal energy per bit, as shown by 


Thus, expressing the average probability of symbol error in terms of the ratio E^/Nq, we 
may write 


With Gray encoding used for the incoming symbols, we find from (7.120) and (7.124) that 
the BER of QPSK is exactly 


We may, therefore, state that a QPSK system achieves the same average probability of bit 
error as a binary PSK system for the same bit rate and the same E b /N 0 , but uses only half 
the channel bandwidth. Stated in another way: 


For a prescribed performance, QPSK uses channel bandwidth better than binary PSK, 
which explains the preferred use of QPSK over binary PSK in practice. 

Earlier we stated that the binary PSK may be viewed as a special case of DSB-SC 
modulation. In a corresponding way, we may view the QPSK as a special case of the 
quadrature amplitude modulation (QAM) in analog modulation theory. 

Power Spectra of QPSK Signals 

Assume that the binary wave at the modulator input is random with symbols 1 and 0 being 
equally likely, and with the symbols transmitted during adjacent time slots being 
statistically independent. We then make the following observations pertaining to the in- 
phase and quadrature components of a QPSK signal: 

Depending on the dibit sent during the signaling interval -7j, < t < 7 h , the in-phase 
component equals +g(f) or -g(t), and similarly for the quadrature component. The 
g(t) denotes the symbol-shaping function defined by 


E = 2 E b 



BER = Q 




0, otherwise 


0 <t<T 
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Power spectra of QPSK and MSK signals. 


Hence, the in-phase and quadrature components have a common power spectral 
density, namely, E sine ~(Tf). 

The in-phase and quadrature components are statistically independent. Accordingly, 
the baseband power spectral density of the QPSK signal equals the sum of the 
individual power spectral densities of the in-phase and quadrature components, so 
we may write 

S B (f) = 2£sinc 2 (77) 

= 4 E h sinc 2 (2 TJ) 

Figure 7.19 plots S B (f), normalized with respect to 4£' b , versus the normalized frequency 
T b f. This figure also includes a plot of the baseband power spectral density of a certain 
form of binary FSK called minimum shift keying, the evaluation of which is presented in 
Section 7.8. Comparison of these two spectra is deferred to that section. 


For a variation of the QPSK, consider the signal-space diagram of Figure 7.20a that 
embodies all the possible phase transitions that can arise in the generation of a QPSK 
signal. More specifically, examining the QPSK waveform illustrated in Figure 7.17 for 
Example 6, we may make three observations: 

The carrier phase changes by ±180° whenever both the in-phase and quadrature 
components of the QPSK signal change sign. An example of this situation is 
illustrated in Figure 7.17 when the input binary sequence switches from dibit 01 to 
dibit 10. 
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The carrier phase changes by ±90° whenever the in-phase or quadrature component 
changes sign. An example of this second situation is illustrated in Figure 7.17 when 
the input binary sequence switches from dibit 10 to dibit 00, during which the in- 
phase component changes sign, whereas the quadrature component is unchanged. 
The carrier phase is unchanged when neither the in-phase component nor the 
quadrature component changes sign. This last situation is illustrated in Figure 7.17 
when dibit 10 is transmitted in two successive symbol intervals. 

Situation 1 and, to a much lesser extent, situation 2 can be of a particular concern when the 
QPSK signal is filtered during the course of transmission, prior to detection. Specifically, 
the 180° and 90° shifts in carrier phase can result in changes in the carrier amplitude (i.e., 
envelope of the QPSK signal) during the course of transmission over the channel, thereby 
causing additional symbol errors on detection at the receiver. 

To mitigate this shortcoming of QPSK, we need to reduce the extent of its amplitude 
fluctuations. To this end, we may use offset QPSK. In this variant of QPSK, the bit stream 
responsible for generating the quadrature component is delayed (i.e., offset) by half a 
symbol interval with respect to the bit stream responsible for generating the in-phase 
component. Specifically, the two basis functions of offset QPSK are defined by 

m = Jfcos^Q, 0 <t<T 

and 

0 2 (O = Jf sin (2nf c t), !<f< y 

The (z)j(f) of (7.129) is exactly the same as that of (7.114) for QPSK, but the </h(t) of 
(7.130) is different from that of (7.115) for QPSK. Accordingly, unlike QPSK, the phase 
transitions likely to occur in offset QPSK are confined to ±90°, as indicated in the signal- 
space diagram of Figure 7.20b. However, ±90° phase transitions in offset QPSK occur 
twice as frequently but with half the intensity encountered in QPSK. Since, in addition to 
±90° phase transitions, ±180° phase transitions also occur in QPSK, we find that 
amplitude fluctuations in offset QPSK due to filtering have a smaller amplitude than in the 
case of QPSK. 
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Despite the delay 772 applied to the basis function <j> 2 (t ) in (7.130) compared with that 
in (7.1 15) for QPSK, the offset QPSK has exactly the same probability of symbol error in 
an AWGN channel as QPSK. The equivalence in noise performance between these PSK 
schemes assumes the use of coherent detection at the receiver. The reason for the 
equivalence is that the statistical independence of the in-phase and quadrature components 
applies to both QPSK and offset QPSK. We may, therefore, say that Equation (7.123) for 
the average probability of symbol error applies equally well to the offset QPSK. 


QPSK is a special case of the generic form of PSK commonly referred to as M-ary PSK, 
where the phase of the carrier takes on one of M possible values: 6 t = 2(7 - 1 )n/M, where 
i = 1 , 2, . . . , M. Accordingly, during each signaling interval of duration T, one of the M 
possible signals 


*i(t) 





i = 1,2, M 


is sent, where E is the signal energy per symbol. The carrier frequency f c = nJT for some 
fixed integer n c . 

Each Sj(t) may be expanded in terms of the same two basis functions (z)j(t) and fait)', the 
signal constellation of M - ary PSK is, therefore, two-dimensional. The M message points 
are equally spaced on a circle of radius Je and center at the origin, as illustrated in Figure 
7.21a for the case of octaphase-shift-keying (i.e., M = 8). 

From Figure 7.21a we see that the signal-space diagram is circularly symmetric. We 
may, therefore, apply (7.85), based on the union bound, to develop an approximate formula 
for the average probability of symbol error for M - ary PSK. Suppose that the transmitted 
signal corresponds to the message point m j, whose coordinates along the </)\- and (Z^-axes are 
+ Je and 0, respectively. Suppose that the ratio E/Nq is large enough to consider the nearest 
two message points, one on either side of m j, as potential candidates for being mistaken for 
mi due to channel noise. This is illustrated in Figure 7.21b for the case of M = 8. The 
Euclidean distance for each of these two points from m | is (for M = 8) 

d n = d n = 2jEsm(^) 


Flence, the use of (7.85) yields the average probability of symbol error for coherent M - ary 
PSK as 


P 


e 



where it is assumed that M > 4. The approximation becomes extremely tight for fixed M, as 
E/Nq is increased. For M = 4, (7.132) reduces to the same form given in (7.123) for QPSK. 


Power Spectra of M-ary PSK Signals 

The symbol duration of M - ary PSK is defined by 

T = r b log 2 M 
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(a) Signal-space diagram for octaphase-shift keying (i.e., M = 8). The 
decision boundaries are shown as dashed lines, (b) Signal-space diagram illustrating 
the application of the union bound for octaphase-shift keying. 
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where T b is the bit duration. Proceeding in a manner similar to that described for a QPSK 
signal, we may show that the baseband power spectral density of an M - ary PSK signal is 
given by 

S B (f) = 2 E sine 2 {Tf) 

= 2£ , b (log 2 M)[sinc 2 (7’ b /'log 2 M)] 

Figure 7.22 is a plot of the normalized power spectral density S%(f)/2E b versus the 
normalized frequency T^f for three different values of M, namely M = 2, 4, 8. Equation 
(7.134) includes (7.1 1 1) for M = 2 and (7.128) for M = 4 as two special cases. 

The baseband power spectra of M - ary PSK signals plotted in Figure 7.22 possess a 
main lobe bounded by well-defined spectral nulls (i.e., frequencies at which the power 
spectral density is zero). In light of the discussion on the bandwidth of signals presented in 
Chapter 2, we may use the main lobe as a basis for bandwidth assessment. Accordingly, 
invoking the notion of null-to-null bandwidth, we may say that the spectral width of the 
main lobe provides a simple, yet informative, measure for the bandwidth of M - ary PSK 
signals. Most importantly, a large fraction of the average signal power is contained inside 
the main lobe. On this basis, we may define the channel bandwidth required to pass M - ary 
PSK signals through an analog channel as 



where T is the symbol duration. But the symbol duration T is related to the bit duration T b 
by (7.133). Moreover, the bit rate R b = 1 /T b . Hence, we may redefine the channel 
bandwidth of (7.135) in terms of the bit rate as 


B = 


_2R^_ 
log 2 M 



Power spectra of M-ary PSK signals for M = 2, 4, 8. 
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Bandwidth efficiency of M - ary PSK signals 








p (bit/ (s/Hz)) 0.5 

1 

1.5 

2 

2.5 

3 


Based on this formula, the bandwidth efficiency of M - ary PSK signals is given by 

B 

log 2 M 


Table 7.3 gives the values of p calculated from (7.137) for varying M. In light of (7.132) 
and Table 7.3, we now make the statement: 


However, note that if we are to ensure that there is no degradation in error performance, 
we have to increase E b /N 0 to compensate for the increase in M. 


M - ary Quadrature Amplitude Modulation 


In an M - ary PSK system, the in-phase and quadrature components of the modulated signal 
are interrelated in such a way that the envelope is constrained to remain constant. This 
constraint manifests itself in a circular constellation for the message points, as illustrated 
in Figure 7.21a. However, if this constraint is removed so as to permit the in-phase and 
quadrature components to be independent, we get a new modulation scheme called M-ary 
QAM. The QAM is a hybrid form of modulation, in that the carrier experiences amplitude 
as well as phase-modulation. 

In M - ary PAM, the signal-space diagram is one-dimensional. M - ary QAM is a two- 
dimensional generalization of M - ary PAM, in that its formulation involves two orthogonal 
passband basis functions: 

m = Jcos(2tf c 0, o <t<T 

m = J|sin(27t/ C Q, 0 <t<T 

Let d mn denote the minimum distance between any two message points in the QAM 
constellation. Then, the projections of the /th message point on the <j>y and <jh -axes are 
respectively defined by ajd min / 2 and b t d m in /2, where i =1,2,..., M. With the separation 
between two message points in the signal-space diagram being proportional to the square 
root of energy, we may therefore set 
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where E 0 is the energy of the message signal with the lowest amplitude. The transmitted 
M - ary QAM signal for symbol k can now be defined in terms of E 0 : 

l2E~n fcK f 0 < t < T 

s k {t) = J-f a k cos ( 2n fcO- J-f b k sln (2nf c t), | jfc = 0 ±1 ±2 

The signal sft) involves two phase-quadrature carriers, each one of which is modulated by 
a set of discrete amplitudes; hence the terminology “quadrature amplitude modulation.” 

In M - ary QAM, the constellation of message points depends on the number of possible 
symbols, M. In what follows, we consider the case of square constellations, for which the 
number of bits per symbol is even. 


QAM Square Constellations 

With an even number of bits per symbol, we write 

L = Jm, /,: positive integer 

Under this condition, an M-ary QAM square constellation can always be viewed as the 
Cartesian product of a one-dimensional L-ary PAM constellation with itself. By definition, 
the Cartesian product of two sets of coordinates (representing a pair of one-dimensional 
constellations) is made up of the set of all possible ordered pairs of coordinates with the 
first coordinate in each such pair being taken from the first set involved in the product and 
the second coordinate taken from the second set in the product. 

Thus, the ordered pairs of coordinates naturally form a square matrix, as shown by 


{a p bj) 


(- L + 1, L - 1) (-L + 3, L - 1) 

(- L + 1, L - 3) (-L + 3, L- 3) 


(L-I.L-l) 
(L- 1, L- 3) 


(- L + 1, — L + 1) (-L+3, - L+ 1) -- (L - 1, - L + 1) 


To calculate the probability of symbol error for this M - ary QAM, we exploit the following 
property: 


To exploit this statement, we may proceed in one of two ways: 

We start with a signal constellation of the M - ary PAM for a prescribed M, 
and then build on it to construct the corresponding signal constellation of the M - ary QAM. 

We start with a signal constellation of the M - ary QAM, and then use it to 
construct the corresponding orthogonal M-ary PAMS. 

In the example to follow, we present a systematic procedure based on Approach 1 . 

M - ary QAM for M = 4 

In Figure 7.23, we have constructed two signal constellations for the 4-ary PAM, one 
vertically oriented along the ^,-axis in part a of the figure, and the other horizontally 
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oriented along the (L-axis in part b of the figure. These two parts are spatially orthogonal 
to each other, accounting for the two-dimensional structure of the M- ary QAM. In 
developing this structure, the following points should be born in mind: 

• The same binary sequence is used for both 4-ary PAM constellations. 

• The Gray encoding rule is applied, which means that as we move from one 
codeword to an adjacent one, only a single bit is changed. 

• In constructing the 4-ary QAM constellation, we move from one quadrant to the 
next in a counterclockwise direction. 

With four quadrants constituting the 4-ary QAM, we proceed in four stages as follows: 

First-quadrant constellation. Referring to Figure 7.23, we use the codewords 
along the positive parts of the <j >2 and ^,-axes, respectively, to write 


11 

10 


[lO ll]— > 


mo mi 
1010 ion 


Top to Left to First quadrant 
bottom right 


Second-quadrant constellation. Following the same procedure as in Stage 1, we 

write 


11 

10 



1101 1100 
1001 1000 


Top to Left to Second quadrant 
bottom right 


3d/2 


11 


dt 2 


10 


0 — 
-d/2» 00 


The two orthogonal constellations of the 

4-ary PAM. (a) Vertically oriented -3<i/2 (| 

constellation, (b) Horizontally oriented 

constellation. As mentioned in the text, 

we move top-down along the (f> 2 -axis and 

from left to right along the 0 r axis. ( a ) 


01 00 10 11 

• • 1 • • <h 

-3d/2 -d/2 dt 2 3d/2 

0 


(b) 
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Third-quadrant constellation. Again, following the same procedure as before, 
we next write 


00 

01 


[oi oo] -> 


0001 0000 
0101 0100 


Top to Left to Third quadrant 
bottom right 


Fourth-quadrant constellation. Finally, we write 


00 

01 


[lO ll] -> 


0010 0011 
0110 0111 


Top to Left to Fourth quadrant 
bottom right 


The final step is to piece together these four constituent 4-ary PAM constellations to 
construct the 4-ary QAM constellations as described in Figure 7.24. The important point 
to note here is that all the codewords in Figure 7.24 obey the Gray encoding rule, not only 
within each quadrant but also as we move from one quadrant to the next. 


(a) Signal-space diagram of M - ary QAM for 
M = 16; the message points in each quadrant 
are identified with Gray-encoded quadbits. 



(p 2 



• 

1101 

• 3d/2 
1100 

— • 

1110 

• 

mi 

• 

1001 

1 

• d/2 
1000 

1 

— • 

1010 

1 

• 

1011 

-3d/2 

-d/2 

d/2 

3d/2 

• 

0001 

• -dl 2 

0000 

— • 

0010 

00*11 

• 

0101 

-3d/2 

0100 

— • 

0110 

• 

0111 


Average Probability of Error 

In light of the equivalence established between the M- ary QAM and M-ary PAM, we may 
formulate the average probability of error of the M-ary QAM by proceeding as follows: 

The probability of correct detection for M- ary QAM is written as 

P c = (l-P' e ) 2 

where P' is the probability of symbol error for the L-ary PAM. 
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With L = Jm , the probability of symbol error P' is itself defined by 



The probability of symbol error for M - ary QAM is given by 

= l-f’c 


= 1 -( 1 - P ;) 2 


2 P' 


where it is assumed that P' is small enough compared with unity to justify ignoring 
the quadratic term. 

Hence, using (7.143) and (7.144) in (7.145), we find that the probability of symbol error 
for M-ary QAM is approximately given by 


P 


e 


Jm 


Q 


2£^ 

N 0 J 


The transmitted energy in M-ary QAM is variable, in that its instantaneous value naturally 
depends on the particular symbol transmitted. Therefore, it is more logical to express P e in 
terms of the average value of the transmitted energy rather than Eq. Assuming that the L 
amplitude levels of the in-phase or quadrature component of the M-ary QAM signal are 
equally likely, we have 


E =2 

av 


2£ 0 i /2 


I<2i-D 

i = 1 


where the overall scaling factor 2 accounts for the equal contributions made by the in-phase 
and quadrature components. The limits of the summation and the scaling factor 2 inside the 
large parentheses account for the symmetric nature of the pertinent amplitude levels around 
zero. Summing the series in (7.147), we get 

„ _2(L 2 -1) j E 0 

av 3 

_ 2(M-l)£ 0 
3 

Accordingly, we may rewrite (7. 146) in terms of £ av as 

which is the desired result. 

The case of M = 4 is of special interest. The signal constellation for this particular value 
of M is the same as that for QPSK. Indeed, putting M = 4 in (7.150) and noting that, for 
this special case, E av equals E, where E is the energy per symbol, we find that the resulting 


3£av 

(M-1)A 0 
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formula for the probability of symbol error becomes identical to that in (7.123) for QPSK; 
and so it should. 


Frequency-Shift Keying Techniques Using Coherent Detection 


M- ary PSK and M- ary QAM share a common property: both of them are examples of 
linear modulation. In this section, we study a nonlinear method of modulation known as 
FSK using coherent detection. We begin the study by considering the simple case of 
binary FSK, for which M = 2. 


In binary FSK, symbols 1 and 0 are distinguished from each other by transmitting one of 
two sinusoidal waves that differ in frequency by a fixed amount. A typical pair of 
sinusoidal waves is described by 


where i = 1,2 and E b is the transmitted signal energy per bit; the transmitted frequency is 
set at 


Symbol 1 is represented by sj(f) and symbol 0 by ^(O- The FSK signal described here is 
known as Sunde ’s FSK. It is a continuous-phase signal, in the sense that phase continuity 
is always maintained, including the inter-bit switching times. 

From (7.151) and (7.152), we observe directly that the signals ,S](f) and Siit) are 
orthogonal, but not normalized to have unit energy. The most useful form for the set of 
orthonormal basis functions is described by 


where i = 1,2. Correspondingly, the coefficient for where i = 1,2 and j = 1, 2 is defined 
by 



n c + 1 

/. = for some fixed integer n and = 1,2 

' T b 
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Carrying out the integration in (7.154), the formula for simplifies to 



0, i*j 


Thus, unlike binary PSK, binary FSK is characterized by having a signal-space diagram 
that is two-dimensional (i.e., N = 2) with two message points (i.e., M = 2), as shown in 
Figure 7.25. The two message points are defined by the vectors 


s 


l 


0 





Signal-space diagram for binary FSK system. The diagram also includes example 
waveforms of the two modulated signals s^t) and s 2 (f). 
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and 


0 



The Euclidean distance ||Sj - s 2 | is equal to j2E b . Figure 7.25 also includes a couple of 
waveforms representative of signals .v ] ( t) and sft). 

Generation and Coherent Detection of Binary FSK Signals 

The block diagram of Figure 7.26a describes a scheme for generating the binary FSK 
signal; it consists of two components: 

On-off level encoder, the output of which is a constant amplitude of jE~ b in 
response to input symbol 1 and zero in response to input symbol 0. 

Pair of oscillators, whose frequencies f \ and / 2 differ by an integer multiple of the 
bit rate \IT b in accordance with (7.152). The lower oscillator with frequency / 2 is 
preceded by an inverter. When in a signaling interval, the input symbol is 1, the 
upper oscillator with frequency /j is switched on and signal V](f) is transmitted, 
while the lower oscillator is switched off. On the other hand, when the input symbol 
is 0, the upper oscillator is switched off, while the lower oscillator is switched on 



<f> 2 W = f2!T h cos(2jc/ 2 r) 


Binary 

FSK 

signal 

sit) 


(a) 


xit) 



Choose 1 if y > 0 
Choose 0 if y < 0 


(b) 

Block diagram for (a) binary FSK transmitter and (b) coherent binary FSK receiver. 
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and signal s 2 (f) with frequency / 2 is transmitted. With phase continuity as a 
requirement, the two oscillators are synchronized with each other. Alternatively, we 
may use a voltage-controlled oscillator, in which case phase continuity is 
automatically satisfied. 

To coherently detect the original binary sequence given the noisy received signal x(f), we 
may use the receiver shown in Figure 7.26b. It consists of two correlators with a common 
input, which are supplied with locally generated coherent reference signals </>i(t) and <jh_(t). 
The correlator outputs are then subtracted, one from the other; the resulting difference y is 
then compared with a threshold of zero. If y > 0, the receiver decides in favor of 1. On the 
other hand, if y < 0, it decides in favor of 0. If y is exactly zero, the receiver makes a 
random guess (i.e., flip of a fair coin) in favor of 1 or 0. 


Error Probability of Binary FSK 

The observation vector x has two elements xj and x 2 that are defined by, respectively. 


and 



x(t)<ft\(t) dr 


x(f)^ 2 (f) dr 


where x( t) is the received signal, whose form depends on which symbol was transmitted. 
Given that symbol 1 was transmitted, x(t) equals S](t) + w(t ), where w(f) is the sample 
function of a white Gaussian noise process of zero mean and power spectral density Nq/2. 
If, on the other hand, symbol 0 was transmitted, x(f) equals ^ 2 (r) + w(r). 

Now, applying the decision rule of (7.57) assuming the use of coherent detection at the 
receiver, we find that the observation space is partitioned into two decision regions, 
labeled Zj and Z 2 in Figure 7.25. The decision boundary, separating region Zj from region 
Z 2 , is the perpendicular bisector of the line joining the two message points. The receiver 
decides in favor of symbol 1 if the received signal point represented by the observation 
vector x falls inside region Z\. This occurs when Xj > x 2 . If, on the other hand, we have 
Xi < x 2 , the received signal point falls inside region Z 2 and the receiver decides in favor of 
symbol 0. On the decision boundary, we have xj = x 2 , in which case the receiver makes a 
random guess in favor of symbol 1 or 0. 

To proceed further, we define a new Gaussian random variable Y whose sample value y 
is equal to the difference between X| and x 2 ; that is, 


y = x j — x 2 


The mean value of the random variable Y depends on which binary symbol was 
transmitted. Given that symbol 1 was sent, the Gaussian random variables A] and X 2 , 
whose sample values are denoted by Xj and x 2 , have mean values equal to jE~ b and zero, 
respectively. Correspondingly, the conditional mean of the random variable Y given that 
symbol 1 was sent is 

E[F| 1] = E[Xj 1 1] - E[X 2 | 1] 

= + jE b 
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On the other hand, given that symbol 0 was sent, the random variables X l and X 1 have 
mean values equal to zero and jE~ b , respectively. Correspondingly, the conditional mean 
of the random variable Y given that symbol 0 was sent is 


E[F|0] = E[Zj |0] - E[X 2 |0] 

= -JK 

The variance of the random variable Y is independent of which binary symbol was sent. 
Since the random variables X\ and X 2 are statistically independent, each with a variance 
equal to N$/2, it follows that 


var[y] = varfZJ + var[X 2 ] 

= N 0 

Suppose we know that symbol 0 was sent. The conditional probability density function of 
the random variable Y is then given by 


/yO'IO) = 


_1 

J2W 0 


exp 


(y + Vgt) 

2 N n 


2 -, 


Since the condition x j > x 2 or, equivalently, y > 0 corresponds to the receiver making a 
decision in favor of symbol 1, we deduce that the conditional probability of error given 
that symbol 0 was sent is 

Pio = P(y > 0|symbol 0 was sent) 


= f /y(.v|0) dv 

J n 


J2^N { 


=f 

tt AT * r\ 


exp 


(y + jE b ) 


2 _, 


2AC 


dv 


To put the integral in (7.165) in a standard form involving the (7-function, we set 

y + ft \ \ = 7 

Then, changing the variable of integration from y to z, we may rewrite (7.165) as 


P 10 


= — r 


f 


exp 


dz 


= Q 




Similarly, we may show the p ()] . the conditional probability of error given that symbol 1 
was sent, has the same value as in (7.167). Accordingly, averaging p 10 and p 0 ] an d 
assuming equiprobable symbols, we find that the average probability of bit error or, 
equivalently, the BER for binary FSK using coherent detection is 

r 

P e = Q 

V 
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Comparing (7.108) and (7.168), we see that for a binary FSK receiver to maintain the 
same BER as in a binary PSK receiver, the bit energy-to-noise density ratio, E^/Nq, has to 
be doubled. This result is in perfect accord with the signal-space diagrams of Figures 7.13 
and 7.25, where we see that in a binary PSK system the Euclidean distance between the 
two message points is equal to 2 jE~ b , whereas in a binary FSK system the corresponding 
distance is j2E b . For a prescribed E b , the minimum distance c/ min in binary PSK is, 

therefore, J2 times that in binary FSK. Recall from (7.89) that the probability of error 

2 

decreases exponentially as d ■ ; hence the difference between (7.108) and (7.168). 


Power Spectra of Binary FSK Signals 

Consider the case of Sunde’s FSK, for which the two transmitted frequencies f l and / 2 
differ by an amount equal to the bit rate 1/7^, and their arithmetic mean equals the nominal 
carrier frequency / c ; as mentioned previously, phase continuity is always maintained, 
including inter-bit switching times. We may express this special binary FSK signal as a 
frequency-modulated signal, defined by 

s(t) = ^cosf2nf c t±^\ 0 <t<T b 

A/ *b 

Using a well-known trigonometric identity, we may reformulate s(t ) in the expanded form 



In the last line of (7.170), the plus sign corresponds to transmitting symbol 0 and the 
minus sign corresponds to transmitting symbol 1 . As before, we assume that the symbols 1 
and 0 in the binary sequence applied to the modulator input are equally likely, and that the 
symbols transmitted in adjacent time slots are statistically independent. Then, based on the 
representation of (7.170), we may make two observations pertaining to the in-phase and 
quadrature components of a binary FSK signal with continuous phase: 

The in-phase component is completely independent of the input binary wave. It 
equals j2E b /T b cos(nt/T b ) for all time t. The power spectral density of this 
component, therefore, consists of two delta functions at t = ±l/2T b and weighted 
by the factor E b /2T b , and occurring at /= ± l/27 h . 

The quadrature component is directly related to the input binary sequence. During 
the signaling interval 0 < t < T b , it equals -g(t) when we have symbol 1 and +g(t) 
when we have symbol 0, with g(t) denoting a symbol-shaping function defined by 


g(0 = 



l o, 


0<t<T 

elsewhere 
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The energy spectral density of g(t ) is defined by 




8£ b r b cos 2 (nri/) 

2 1 2 2 

(4T b r -i) 


The power spectral density of the quadrature component equals T / ? (/)/7' b . It is also 
apparent that the in-phase and quadrature components of the binary FSK signal are 
independent of each other. Accordingly, the baseband power spectral density of Sunde’s 
FSK signal equals the sum of the power spectral densities of these two components, as 
shown by 


S B (f) = 


2T U 


SU- 


IT, 


+ S[f+ 


2 T, 


b J 


8F b T b cos (Jt7y) 


2 2 2 
jt-(4r b 7 - 


i) 


From Chapter 4, we recall the following relationship between baseband modulated power 
spectra: 


S s (f) = \[S B (f-f c ) + S B (f + f c )] 


where f c is the carrier frequency. Therefore, substituting (7.173) into (7.174), we find that 
the power spectrum of the binary FSK signal contains two discrete frequency components, 
one located at (f c + l/2r b ) =/i and the other located at (f c - l/2T b ) = / 2 , with their average 
powers adding up to one-half the total power of the binary FSK signal. The presence of 
these two discrete frequency components serves a useful purpose: it provides a practical 
basis for synchronizing the receiver with the transmitter. 

Examining (7.173), we may make the following statement: 


In Figure 7.15, we plotted the baseband power spectra of (7.1 1 1) and (7.173). (To simplify 
matters, we have only plotted the results for positive frequencies.) In both cases, S B (f) is 
shown normalized with respect to 2 E b , and the frequency is normalized with respect to the 
bit rate R b = l/7 b . The difference in the falloff rates of these spectra can be explained on 
the basis of the pulse shape g(t). The smoother the pulse, the faster the drop of spectral 
tails to zero. Thus, since binary FSK with continuous phase has a smoother pulse shape, it 
has lower sidelobes than binary PSK does. 

Suppose, next, the FSK signal exhibits phase discontinuity at the inter-bit switching 
instants, which arises when the two oscillators supplying the basis functions with 
frequencies /j and / 2 operate independently of each other. In this discontinuous scenario, 
we find that power spectral density ultimately falls off as the inverse square of frequency. 
Accordingly, we may state: 
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The important point to take from this statement is summed up as follows: when 
interference is an issue of practical concern, continuous FSK is preferred over its 
discontinuous counterpart. However, this advantage of continuous FSK is gained at the 
expense of increased system complexity. 


In the coherent detection of binary FSK signal, the phase information contained in the 
received signal is not fully exploited, other than to provide for synchronization of the 
receiver to the transmitter. We now show that by proper use of the continuous-phase 
property when performing detection it is possible to improve the noise performance of the 
receiver significantly. Here again, this improvement is achieved at the expense of 
increased system complexity. 

Consider a continuous-phase frequency-shift keying ( CPFSK ) signal, which is defined 
for the signaling interval 0 < t < T b as follows: 


where E b is the transmitted signal energy per bit and T b is the bit duration. The defining 
equation (7.175) distinguishes itself from that of (7.151) in using the phase 9(0). This new 
term, denoting the value of the phase at time t = 0, sums up the past history of the FM 
process up to time t = 0. The frequencies f] and / 2 are sent in response to binary symbols 1 
and 0, respectively, applied to the modulator input. 

Another useful way of representing the CPFSK signal s(t ) is to express it as a 
conventional angle-modulated signal: 


where 9(t) is the phase of sit) at time t. When the phase 9(t) is a continuous function of 
time, we find that the modulated signal s(t) is itself also continuous at all times, including 
the inter-bit switching times. The phase 9(t) of a CPFSK signal increases or decreases 
linearly with time during each bit duration of T b seconds, as shown by 


where the plus sign corresponds to sending symbol 1 and the minus sign corresponds to 
sending symbol 0; the dimensionless parameter h is to be defined. Substituting (7.177) 
into (7.176), and then comparing the angle of the cosine function with that of (7.175), we 
deduce the following pair of relations: 








b 


Frequency-Shift Keying Techniques Using Coherent Detection 


383 



Solving this pair of equations for f c and h, we get 


f c = fa + / 2 ) 


and 


h = T h (j\ -f 2 ) 


The nominal carrier frequency f c is, therefore, the arithmetic mean of the transmitted 
frequencies /j and f 2 . The difference between the frequencies and / 2 , normalized with 
respect to the bit rate 1 /T b , defines the dimensionless parameter h, which is referred to as 
the deviation ratio. 

Phase Trellis 

From (7 .177) we find that, at time t = T b , 


That is to say, sending symbol 1 increases the phase of a CPFSK signal s(t) by nh radians, 
whereas sending symbol 0 reduces it by an equal amount. 

The variation of phase 0(t) with time t follows a path consisting of a sequence of 
straight lines, the slopes of which represent frequency changes. Figure 7.27 depicts 
possible paths starting from t - 0. A plot like that shown in this figure is called a phase 
tree. The tree makes clear the transitions of phase across successive signaling intervals. 
Moreover, it is evident from the figure that the phase of a CPFSK signal is an odd or even 
multiple of nh radians at odd or even multiples of the bit duration 7 h , respectively. 



nh for symbol 1 
-nh for symbol 0 



Phase tree. 
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The phase tree described in Figure 7.27 is a manifestation of phase continuity, which is 
an inherent characteristic of a CPFSK signal. To appreciate the notion of phase continuity, 
let us go back for a moment to Sunde’s FSK, which is also a CPFSK signal as previously 
described. In this case, the deviation ratio h is exactly unity. Hence, according to Figure 
7.27, the phase change over one bit interval is ±jt radians. But, a change of +n radians is 
exactly the same as a change of -n radians, modulo 2n. It follows, therefore, that in the 
case of Sunde’s FSK there is no memory; that is, knowing which particular change 
occurred in the previous signaling interval provides no help in the current signaling 
interval. 

In contrast, we have a completely different situation when the deviation ratio h is 
assigned the special value of 1/2. We now find that the phase can take on only the two 
values ±7t/2 at odd multiples of 7 b , and only the two values 0 and n at even multiples of T b , 
as in Figure 7.28. This second graph is called a phase trellis, since a “trellis” is a treelike 
structure with re-emerging branches. Each path from left to right through the trellis of 
Figure 7.28 corresponds to a specific binary sequence at the transmitter input. For 
example, the path shown in boldface in Figure 7.28 corresponds to the binary sequence 
1 101000 with #(0) = 0. Henceforth, we focus on h = 1/2. 

With h = 1/2, we find from (7.181) that the frequency deviation (i.e., the difference 
between the two signaling frequencies f\ and / 2 ) equals half the bit rate; hence the 
following statement: 


In other words, symbols 1 and 0 do not interfere with one another in the process of 
detection. It is for this reason that a CPFSK signal with a deviation ratio of one-half is 
commonly referred to as minimum shift-keying (MSK). 

Signal-Space Diagram of MSK 

Using a well-known trigonometric identity in (7.176), we may expand the CPFSK signal 
s(t) in terms of its in-phase and quadrature components as 




71 — 


Phase trellis; boldfaced path represents the sequence 1101000. 
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Consider, first, the in-phase component j2E b /T b cos 9( t) . With the deviation ratio h = 1/2, 
we have from (7.177) that 

m = m±£r, o<t<T b 

A1 b 

where the plus sign corresponds to symbol 1 and the minus sign corresponds to symbol 0. 
A similar result holds for 6(t) in the interval -T b <t< 0, except that the algebraic sign is 
not necessarily the same in both intervals. Since the phase 8(0) is 0 or n depending on the 
past history of the modulation process, we find that in the interval -T b <t<T b , the polarity 
of cos 9(f) depends only on 8(0), regardless of the sequence of Is and Os transmitted 
before or after t = 0. Thus, for this time interval, the in-phase component consists of the 
half-cycle cosine pulse : 

= 


where the plus sign corresponds to 8(0) = 0 and the minus sign corresponds to 9(0) = n. In 
a similar way, we may show that, in the interval 0 < t < 2T b , the quadrature component of 
s(t) consists of the half-cycle sine pulse : 

Sq (0 = 


where the plus sign corresponds to diTf) = n/2 and the minus sign corresponds to 
9(Tf) = -n/2. From the discussion just presented, we see that the in-phase and quadrature 
components of the MSK signal differ from each other in two important respects: 

• they are in phase quadrature with respect to each other and 

• the polarity of the in-phase component Sj(f) depends on 8(0), whereas the polarity of 
the quadrature component SqU) depends on 9(T b ). 

Moreover, since the phase states 8(0) and 9(Tf) can each assume only one of two possible 
values, any one of the following four possibilities can arise: 

9(0) = 0 and 9(T b ) = n/2, which occur when sending symbol 1. 

9(0) = n and 9(T b ) = n/2, which occur when sending symbol 0. 
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0(0) = n and 0(T b ) = -n/2 (or, equivalently, 3n/2 modulo 2tt), which occur when 
sending symbol 1 . 

0(0) = 0 and 0(T b ) = -n/2, which occur when sending symbol 0. 

This fourfold scenario, in turn, means that the MSK signal itself can assume one of four 
possible forms, depending on the values of the phase-state pair: <9(0 ) and 0(T b ). 


Signal-Space Diagram 

Examining the expansion of (7. 183), we see that there are two orthonormal basis functions 
</>l(t) and <jh(t) characterizing the generation of MSK; they are defined by the following 
pair of sinusoidally modulated quadrature carriers : 



With the formulation of a signal-space diagram in mind, we rewrite (7.183) in the compact 
form 

s(t) = s { (j) x (t) + s 2 <l) 2 (t), 0 <t<T b 


where the coefficients .Vj and Si are related to the phase states 0(0) and 0(T b ), respectively. 
To evaluate s j, we integrate the product s(t)</if t) with respect to time t between the limits -T b 
and T b , obtaining 

T b 

s(t)<j> x (t) dr 

-r b 

= jE b co S [mi - T b <t<T b 



Similarly, to evaluate 52 we integrate the product s(t)<jh(t) with respect to time t between 
the limits 0 and 27^, obtaining 


s 2 


2 T b 

s(t)</> 2 (t) dr 

J o 

jE b sin[0(T b )l 0 <r<T b 


Examining (7.190) and (7.191), we now make three observations: 


Both integrals are evaluated for a time interval equal to twice the bit duration. 

The lower and upper limits of the integral in (7.190) used to evaluate ,v | are shifted 
by the bit duration T b with respect to those used to evaluate ,v 2 . 

The time interval 0 <t<T b , for which the phase states 6(0) and 0(T b ) are defined, is 
common to both integrals. 


It follows, therefore, that the signal constellation for an MSK signal is two-dimensional 
(i.e., N = 2), with four possible message points (i.e., M = 4), as illustrated in the signal- 
space diagram of Figure 7.29. Moving in a counterclockwise direction, the coordinates of 
the message points are as follows: 
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( + jE b ,+ jE b ) , (-jE b , + jE b ), (-jE b ,-jE b ), and ( + jE b ,-JF b ) . 

The possible values of 0(0) and 0(T b ), corresponding to these four message points, are 
also included in Figure 7.29. The signal-space diagram of MSK is thus similar to that of 
QPSK in that both of them have four message points in a two-dimensional space. 
However, they differ in a subtle way that should be carefully noted: 

• QPSK, moving from one message point to an adjacent one, is produced by sending a 
two-bit symbol (i.e., dibit). 

• MSK, on the other hand, moving from one message point to an adjacent one, is 
produced by sending a binary symbol, 0 or 1. However, each symbol shows up in 
two opposite quadrants, depending on the value of the phase-pair: 0(0) and 0(T b ). 

Table 7.4 presents a summary of the values of 0(0) and 0(T b ), as well as the corresponding 
values of ,V| and 52 that are calculated for the time intervals -T b < t <T b and 0 < t < 2T b , 
respectively. The first column of this table indicates whether symbol 1 or symbol 0 was 
sent in the interval 0 < t <T b . Note that the coordinates of the message points, .V] and ,v 2 , 
have opposite signs when symbol 1 is sent in this interval, but the same sign when symbol 
0 is sent. Accordingly, for a given input data sequence, we may use the entries of Table 7.4 
to derive on a bit-by-bit basis the two sequences of coefficients required to scale </>i(t) and 
</h(t), and thereby determine the MSK signal s(t). 
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Signal-space characterization of MSK 
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MSK Waveforms 

Figure 7.30 shows the sequences and waveforms involved in the generation of an MSK 
signal for the binary sequence 1 101000. The input binary sequence is shown in Figure 7.30a. 
The two modulation frequencies are/| = 5/4r b and /> = 3/47 b . Assuming that at time t = 0 


(a) Input binary sequence, 

(b) Waveform of scaled time function 

(c) Waveform of scaled time 
function s 2 </> 2 (t). (d) Waveform of the 
MSK signal s(t) obtained by adding 
S\(j)\{f) and .s’202(O on a bit-by-bit basis. 
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the phase 6( 0 j is zero, the sequence of phase states is as shown in Figure 7.30, modulo 2n. 
The polarities of the two sequences of factors used to scale the time functions (zJj(f) and <jh(t) 
are shown in the top lines of Figure 7.30b and c. These two sequences are offset relative to 
each other by an interval equal to the bit duration T b . The waveforms of the resulting two 
components of s(t), namely, s^^t) and S 2 <jh(f), are shown in Figure 7.30b and c. Adding 
these two modulated waveforms, we get the desired MSK signal s(t ) shown in Figure 7.30d. 


With h = 1/2, we may use the block diagram of Figure 7.31a to generate the MSK signal. 
The advantage of this method of generating MSK signals is that the signal coherence and 
deviation ratio are largely unaffected by variations in the input data rate. Two input sinu- 
soidal waves, one of frequency f c = n c /4T b for some fixed integer n c and the other of 
frequency H4T b , are first applied to a product modulator. This modulator produces two 
phase-coherent sinusoidal waves at frequencies /j and / 2 , which are related to the carrier 




In-phase channel Phase 



Threshold = 0 


(b) 

Block diagrams for (a) MSK transmitter and (b) coherent MSK receiver. 


390 


Signaling over AWGN Channels 


frequency f c and the bit rate 1 IT b in accordance with (7.178) and (7.179) for deviation ratio 
h = 1/2. These two sinusoidal waves are separated from each other by two narrowband fil- 
ters, one centered at/j and the other at f 2 . The resulting filter outputs are next linearly 
combined to produce the pair of quadrature carriers or orthonormal basis functions <j>\ (t) 
and </>i(t). Finally, </>\(t) and tf> 2 (t) are multiplied with two binary waves a b (f) and a 2 (t), both 
of which have a bit rate equal to 1/(2 T b ). These two binary waves are extracted from the 
incoming binary sequence in the manner described in Example 7. 

Figure 7.31b shows the block diagram of the coherent MSK receiver. The received 
signal x(t) is correlated with <j)\(t) and <jh(t). In both cases, the integration interval is 27 b 
seconds, and the integration in the quadrature channel is delayed by T b seconds with respect 
to that in the in-phase channel. The resulting in-phase and quadrature channel correlator 
outputs, .r | and x 2 , are each compared with a threshold of zero; estimates of the phase (9(0) 
and 6(T b ) are then derived in the manner described previously. Finally, these phase 
decisions are interleaved so as to estimate the original binary sequence at the transmitter 
input with the minimum average probability of symbol error in an AWGN channel. 


In the case of an AWGN channel, the received signal is given by 

x(t ) = s(t) + w(t) 

where s(t ) is the transmitted MSK signal and w(t) is the sample function of a white 
Gaussian noise process of zero mean and power spectral density Nq/2. To decide whether 
symbol 1 or symbol 0 was sent in the interval 0 < t < T b , say, we have to establish a 
procedure for the use of x(t) to detect the phase states (9(0) and 6(T b ). 

For the optimum detection of (9(0), we project the received signal x(t) onto the 
reference signal t) over the interval -T b < t < 7 b , obtaining 

x(t)(j) x (t) dr 
-7b 

= Sj + W 2 

where .V] is as defined by (7.190) and vv ] is the sample value of a Gaussian random 
variable of zero mean and variance N 0 /2. From the signal-space diagram of Figure 7.29, 
we see that if .rj > 0, the receiver chooses the estimate (9(0) = 0. On the other hand, if 
.xq <0, it chooses the estimate (9(0) = n. 

Similarly, for the optimum detection of 9{T b ), we project the received signal x(t) onto 
the second reference signal (^(f) over the interval 0 < t < 27 h , obtaining 

27b 

x 2 = x(t)<t> 2 (t) df 

J o 

= s n + w 2 , 0 < t< 2 T b 

where s 2 is as defined by (7.191) and w 2 is the sample value of another independent 
Gaussian random variable of zero mean and variance Nq/2. Referring again to the signal- 
space diagram of Figure 7.29, we see that if x 2 > 0, the receiver chooses the estimate 
9{T b ) = -n/2 . If, however, x 2 < 0 , the receiver chooses the estimate 6 ( 7 h ) = 7t/2. 
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To reconstruct the original binary sequence, we interleave the above two sets of phase 
estimates in accordance with Table 7.4, by proceeding as follows: 

• If estimates (9(0) = 0 and (9(0) = ( T b ) = -n/2, or alternatively if (9(0) = n 
and 0(T b ) = -n/2 , then the receiver decides in favor of symbol 0. 

• If, on the other hand, the estimates (9(0) = n and 8(T.) = -n/2 , or alternatively 
if <9(0) = 0 and 8(T b ) = n/2, then the receiver decides in favor of symbol 1. 

Most importantly, examining the signal-space diagram of Figure 7.29, we see that the 
coordinates of the four message points characterizing the MSK signal are identical to those 
of the QPSK signal in Figure 7.16. Moreover, the zero-mean noise variables in (7.192) and 
(7.193) have exactly the same variance as those for the QPSK signal in (7.1 18) and (7.119). 
It follows, therefore, that the BER for the coherent detection of MSK signals is given by 


which is the same as that of QPSK in (7.126). In both MSK and QPSK, this good 
performance is the result of coherent detection being performed in the receiver on the 
basis of observations over 2T h seconds. 


As with the binary FSK signal, we assume that the input binary wave is random, with 
symbols 1 and 0 being equally likely and the symbols sent during adjacent time slots being 
statistically independent. Under these assumptions, we make three observations: 

Depending on the value of phase state (9(0), the in-phase component equals +g(r) or 
-g(t), where the pulse-shaping function 


The power spectral density of the in-phase component equals »|/ „ (/) / 2 7 b . 
Depending on the value of the phase state d(T^), the quadrature component equals 
+g(t) or -g(t), where we now have 



The energy spectral density of g(t ) is 



32£ b 7’ b rcos(2ji7’ b /)l2 




Despite the difference in which the time interval over two adjacent time slots is 
defined in (7.195) and (7.197), we get the same energy spectral density as in (7.196). 
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Hence, the in-phase and quadrature components have the same power spectral 
density. 

The in-phase and quadrature components of the MSK signal are statistically 
independent; it follows that the baseband power spectral density of s{t) is given by 



32 £/ cos(27t 7//) 2 

2 2 2 

n L 161* f- 1 J 


A plot of the baseband power spectrum of (7.198) is included in Figure 7.19, where the 
power spectrum is normalized with respect to 4 £/ and the frequency /is normalized with 
respect to the bit rate 1/7/. Figure 7.19 also includes the corresponding plot of (7.128) for 
the QPSK signal. As stated previously, for/» l/7 h the baseband power spectral density of 
the MSK signal falls off as the inverse fourth power of frequency, whereas in the case of 
the QPSK signal it falls off as the inverse square of frequency. Accordingly, MSK does not 
produce as much interference outside the signal band of interest as QPSK does. This is a 
desirable characteristic of MSK, especially when the digital communication system 
operates with a bandwidth limitation in an interfering environment. 


From the detailed study of MSK just presented, we may summarize its desirable 
properties; 

• modulated signal with constant envelope; 

• relatively narrow-bandwidth occupancy; 

• coherent detection performance equivalent to that of QPSK. 

However, the out-of-band spectral characteristics of MSK signals, as good as they are, still 
do not satisfy the stringent requirements of certain applications such as wireless communi- 
cations. To illustrate this limitation, we find from (7.198) that, at 7// = 0.5, the baseband 
power spectral density of the MSK signal drops by only 10 log ]( )9 = 9.54 dB below its mid- 
band value. Hence, when the MSK signal is assigned a transmission bandwidth of 1/7/, the 
adjacent channel interference of a wireless-communication system using MSK is not low 
enough to satisfy the practical requirements of a multiuser-communications environment. 

Recognizing that the MSK signal can be generated by direct FM of a voltage-controlled 
oscillator, we may overcome this practical limitation of MSK by modifying its power 
spectrum into a more compact form while maintaining the constant-envelope property of 
the MSK signal. This modification can be achieved through the use of a premodulation 
low-pass filter, hereafter referred to as a baseband pulse-shaping filter. Desirably, the 
pulse-shaping filter should satisfy the following three conditions: 

• frequency response with narrow bandwidth and sharp cutoff characteristics; 

• impulse response with relatively low overshoot; and 

• evolution of a phase trellis with the carrier phase of the modulated signal assuming 
the two values ±jt/2 at odd multiples of the bit duration 7/ and the two values 0 and 
k at even multiples of 7/ as in MSK. 
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The frequency-response condition is needed to suppress the high-frequency components 
of the modified frequency-modulated signal. The impulse-response condition avoids 
excessive deviations in the instantaneous frequency of the modified frequency-modulated 
signal. Finally, the condition imposed on phase-trellis evolution ensures that the modified 
frequency-modulated signal can be coherently detected in the same way as the MSK 
signal, or it can be noncoherently detected as a simple binary FSK signal if so desired. 

These three conditions can be satisfied by passing an NRZ-level-encoded binary data 
stream through a baseband pulse-shaping filter whose impulse response (and, likewise, its 
frequency response) is defined by a Gaussian function. The resulting method of binary 
FM is naturally referred to as Gaussian-filtered minimum-shift keying (GMSK). 

Let W denote the 3 dB baseband bandwidth of the pulse-shaping filter. We may then 
define the transfer function H(f) and impulse response hit) of the pulse-shaping filter as: 



where In denotes the natural algorithm. The response of this Gaussian filter to a 
rectangular pulse of unit amplitude and duration T b , centered on the origin, is given by 


r V 2 

g(t) = h(t- r)dr 

J -T b / 2 


—w\ 

ln2 J 


exp 


2 

2n TJ7 2 . .2" 


d r 


The pulse response g(t ) in (7.201) provides the basis for building the GMSK modulator, with 
the dimensionless time-bandwidth product WT b playing the role of a design parameter. 


Frequency-shaping pulse git) of (7.201) 
shifted in time by 2.57), and truncated 
at ±2.5 T b for varying time— bandwidth 
product WT b . 
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Unfortunately, the pulse response g(t) is noncauscil and, therefore, not physically 
realizable for real-time operation. Specifically, git) is nonzero for t < -7 h /2, where t = —T b /2 
is the time at which the input rectangular pulse (symmetrically positioned around the origin) 
is applied to the Gaussian filter. For a causal response, git) must be truncated and shifted in 
time. Figure 7.32 presents plots of git), which has been truncated at t = ±2.5 T b and then 
shifted in time by 2.5 7j r The plots shown here are for three different settings: WT b = 0.2, 
0.25, and 0.3. Note that as WT b is reduced, the time spread of the frequency-shaping pulse is 
correspondingly increased. 

Figure 7.33 shows the machine-computed power spectra of MSK signals (expressed in 
decibels) versus the normalized frequency difference if - f c )T b , where f c is the mid-band 
frequency and T b is the bit duration. The results plotted in Figure 7.33 are for varying 
values of the time-bandwidth product WT b . From this figure we may make the following 
observations: 

• The curve for the limiting condition WT b = oo corresponds to the case of ordinary 
MSK. 

• When WT b is less than unity, increasingly more of the transmit power is 
concentrated inside the passband of the GMSK signal. 

An undesirable feature of GMSK is that the processing of NRZ binary data by a Gaussian 
filter generates a modulating signal that is no longer confined to a single bit interval as in 
ordinary MSK, which is readily apparent from Figure 7.33. Stated in another way, the tails 
of the Gaussian impulse response of the pulse-shaping filter cause the modulating signal to 
spread out to adjust symbol intervals. The net result is the generation of intersymbol 
interference , the extent of which increases with decreasing WT b . In light of this discussion 
and the various plots presented in Figure 7.33, we find that the value assigned to the time- 
bandwidth product WT b offers a tradeoff between spectral compactness and system- 
performance loss. 



Power spectra of MSK and GMSK signals for varying 
time— bandwidth product. 
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Theoretical E^/Nq degradation of GMSK for varying 
time— bandwidth product. 


To explore the issue of performance degradation resulting from the use of GMSK 
compared with MSK, consider the coherent detection in the presence of AWGN. 
Recognizing that GMSK is a special kind of binary FM, we may express its average 
probability of symbol error P e by the empirical formula 



where, as before, E b is the signal energy per bit and /V () /2 is the noise spectral density. The 
factor a is a constant whose value depends on the time-bandwidth product WT b . Comparing 
(7.202) for GMSK with (7.194) for ordinary MSK, we may view 10 log 10 («/2), expressed in 
decibels, as a measure of performance degradation of GMSK compared with ordinary MSK. 
Figure 7.34 shows the machine-computed value of 10 log 10 («/2) versus WT b . For ordinary 
MSK we have WT b = oo, in which case (7.202) with a = 2 assumes exactly the same form 
as (7.194) and there is no degradation in performance, which is confirmed by Figure 7.34. 
For GMSK with WT b = 0.3 we find from Figure 7.34 that there is a degradation in 
performance of about 0.46dB, which corresponds to all = 0.9. This degradation in 
performance is a small price to pay for the highly desirable spectral compactness of the 
GMSK signal. 


Consider next the M - ary version of FSK, for which the transmitted signals are defined by 


■ s ,(0 



71 , ' 

f(n c + 0 1 


0 <t<T 


where i = 1,2, .... M, and the carrier frequency f c = nJ(2T) for some fixed integer n c . The 
transmitted symbols are of equal duration T and have equal energy E. Since the individual 
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signal frequencies are separated by 1/(2 T) Hz, the M-ary FSK signals in (7.203) constitute 
an orthogonal set ; that is, 

r T 

sft)sXt) dr = 0, I *j 
J o 

Hence, we may use the transmitted signals sff) themselves, except for energy 
normalization, as a complete orthonormal set of basis functions, as shown by 


</>ft) = - 7 =^.(r), for 0<t<T and i=l,2,...,M 

Je 


Accordingly, the M- ary FSK is described by an M-dimensional signal-space diagram. 

For the coherent detection of M-ary FSK signals, the optimum receiver consists of a 
bank of M correlators or matched filters, with fzi ; (r) of (7.205) providing the basis 
functions. At the sampling times t = kT, the receiver makes decisions based on the largest 
matched filter output in accordance with the maximum likelihood decoding rule. An exact 
formula for the probability of symbol error is, however, difficult to derive for a coherent 
M - ary FSK system. Nevertheless, we may use the union bound of (7.88) to place an upper 
bound on the average probability of symbol error for M-ary FSK. Specifically, since the 
minimum distance r/ m j n in M-ary FSK is JlE, using (7.87) we get (assuming 
equiprobable symbols) 



For fixed M, this bound becomes increasingly tight as the ratio E/Nq is increased. Indeed, 
it becomes a good approximation to P e for values of P e < 1(T 2 3 . Moreover, for M =2 (i.e., 
binary FSK), the bound of (7.202) becomes an equality; see (7.168). 


Power Spectra of M-ary FSK Signals 

The spectral analysis of M-ary FSK signals is much more complicated than that of M-ary 
PSK signals. A case of particular interest occurs when the frequencies assigned to the 
multilevels make the frequency spacing uniform and the frequency deviation h = 1/2. That 
is, the M signal frequencies are separated by 1/27) where T is the symbol duration. For 
h = 1/2, the baseband power spectral density of M - ary FSK signals is plotted in Figure 
7.35 for M = 2, 4,8. 


Bandwidth Efficiency of M-ary FSK Signals 

When the orthogonal signals of an M-ary FSK signal are detected coherently, the adjacent 
signals need only be separated from each other by a frequency difference 1/2 T so as to 
maintain orthogonality. Hence, we may define the channel bandwidth required to transmit 
M - ary FSK signals as 

2 T 

For multilevels with frequency assignments that make the frequency spacing uniform and 
equal to 1/271, the bandwidth B of (7.207) contains a large fraction of the signal power. 
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This is readily confirmed by looking at the baseband power spectral plots shown in Figure 
7.36. From (7.133) we recall that the symbol period T is equal to 7 b log 0 M . Flence, using 
R b = 1/77, we may redefine the channel bandwidth B for M - ary FSK signals as 

R b M 

B = 2 

2 log 2 M 


The bandwidth efficiency of M - ary signals is therefore 


P = 


B 

2 log 2 M 


M 


Table 7.5 gives the values of p calculated from (7.207) for varying M. 

Comparing Tables 7.3 and 7.5, we see that increasing the number of levels M tends to 
increase the bandwidth efficiency of M - ary PSK signals, but it also tends to decrease the 
bandwidth efficiency of M - ary FSK signals. In other words, M-ary PSK signals are 
spectrally efficient, whereas M - ary FSK signals are spectrally inefficient. 


Bandwidth efficiency of M - ary FSK signals 


M 

2 

4 

8 

16 

32 

64 

p (bits/(sHz)) 

1 

1 

0.75 

0.5 

0.3125 

0.1875 
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Comparison of M - ary PSK and M - ary FSK from an 
Information-Theoretic Viewpoint 


Bandwidth efficiency, as just discussed, provides one way of contrasting the capabilities of 
M-ary PSK and M-ary FSK. Another way of contrasting the capabilities of these two 
generalized digital modulation schemes is to look at the bandwidth-power tradeoff viewed 
in light of Shannon’s information capacity law, which was discussed previously in Chapter 5. 

Consider, first, an M- ary PSK system that employs a nonorthogonal set of M phase- 
shifted signals for the transmission of binary data over an AWGN channel. Referring back 
to Section 7.6, recall that (7.137) defines the bandwidth efficiency of the M- ary PSK 
system, using the null-to-null bandwidth. Based on this equation, Figure 7.36 plots the 
operating points for different phase-level numbers M = 2, 4, 8, 16, 32, 64. Each point on 
the operating curve corresponds to an average probability of symbol error If = 1(T 5 ; this 
value of P e is small enough to assume “error-free” transmission. Given this fixed value of 
P e , (7.132) for the coherent detection of M-ary PSK is used to calculate the symbol 
energy-to-noise density ratio E/N 0 and, therefore, E b /N 0 for a prescribed M; Figure 7.36 
also includes the capacity boundary for the ideal transmission system, computed in 
accordance with (5.99). Figure 7.36 teaches us the following: 


Consider next an M- ary FSK system that uses an orthogonal set of M frequency-shifted 
signals for the transmission of binary data over an AWGN channel. As discussed in 
Section 7.8, the separation between adjacent signal frequencies in the set is 1/27", where T 
is the symbol period. The bandwidth efficiency of M-ary FSK is defined in (7.209), the 
formulation of which also invokes the null-to-null bandwidth. Using this equation. Figure 
7.37 plots the operating points for different frequency-level numbers M = 2, 4, 8, 16, 32, 
64 for the same average probability of symbol error, namely P e = 1(T 5 . Given this fixed 
value of P e , (7.206) is used to calculate the E/Nq and, therefore, E h IN {) required for a 
prescribed value of M. As in Figure 7.36 for M-ary PSK, Figure 7.37 for M-ary FSK also 
includes the capacity boundary for the ideal condition of error-free transmission. Figure 
7.37 shows that increasing M in M-ary FSK has the opposite effect to that in M-ary PSK. 
In more specific terms, we may state the following: 


In other words, in an information-theoretic context, M-ary FSK behaves better than M-ary 
PSK. 

In the final analysis, the choice of M-ary PSK or M-ary FSK for binary data 
transmission over an AWGN channel is determined by the design criterion of interest: 
bandwidth efficiency or the E b /N 0 needed for reliable data transmission. 
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Comparison of M - ary PSK with the ideal system for P e = 10 5 . 



Comparison of M - ary FSK with the ideal system for P e = 10 5 . 
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Detection of Signals with Unknown Phase 


Up to this point in the chapter we have assumed that the receiver is perfectly synchronized 
to the transmitter and the only channel impairment is AWGN. In practice, however, it is 
often found that, in addition to the uncertainty due to channel noise, there is also 
uncertainty due to the randomness of certain signal parameters. The usual cause of this 
uncertainty is distortion in the transmission medium. Perhaps the most common random 
signal parameter is the carrier phase, which is especially true for narrowband signals. For 
example, the transmission may take place over a multiplicity of paths of different and 
variable length, or there may be rapidly varying delays in the propagating medium from 
transmitter to receiver. These sources of uncertainty may cause the phase of the received 
signal to change in a way that the receiver cannot follow. Synchronization with the phase 
of the transmitted carrier is then too costly and the designer may simply choose to 
disregard the phase information in the received signal at the expense of some degradation 
in noise performance. A digital communication receiver with no provision made for 
carrier phase recovery is said to be noncoherent. 


Consider a binary communication system, in which the transmitted signal is defined by 


where E is the signal energy, T is the duration of the signaling interval, and the carrier 
frequency/)- for symbol i is an integer multiple of 1/(2 T). For reasons just mentioned, the 
receiver operates noncoherently with respect to the transmitter, in which case the received 
signal for an AW GN channel is written as 

fyp 

x(t) = J : j 7 cos(2nf j t + 0) + w(t), for 0 <t<T and i = 1,2 

where 6? is the unknown carrier phase and, as before, w(t) is the sample function of a white 
Gaussian noise process of zero mean and power spectral density Nq/2. Assuming complete 
lack of prior information about 0, we may treat it as the sample value of a random variable 
with uniform distribution'. 


Such a distribution represents the worst-case scenario that could be encountered in 
practice. The binary detection problem to be solved may now be stated as follows: 




—7i < 9< n 


0, otherwise 


Proceeding in a manner similar to that described in Section 7.4, we may formulate the 
likelihood function of symbol Sj given the carrier phase 6 as 
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/(s,.(<9)) = exp 


~~ J x(t) cos(27t/.f + ft) dr 


To proceed further, we have to remove dependence of /(s ; -(ft)) on phase 6, which is 
achieved by integrating it over all possible values of ft, as shown by 


l(s t ) = f d 0 


-If 

2ni 


J ex P f^x(t) cos (2 nfjt + ft) 


dft 


Using a well-known trigonometric formula, we may expand the cosine term in (7.214) as 
cos(2ji f-t+0) = cos(2jt/ ( f) cos ft- sin(27t/ ; .f) sin ft 

Correspondingly, we may rewrite the integral in the exponent of (7.214) as 

T T T 

f x(t) cos(2jt/ ( f + ft) d/ = cosftf x(t ) cos(27t/ ; r) df- sinftf x(?)sin(27t/ ( f) dr 

J 0 J o J o 

Define two new terms: 

T 


a- 


= | |^J v(r) cos(2rt/ ( .r) df + |^J x(r) sin(27t/-r) dr 

r T 

I x(r)sin(2jt/.r) dr 


1/2 


Pi = tan 


r J 

£ 
T 

DO 


J x(r)cos(2n/ ; .r) dr 


Then, we may go one step further and simplify the inner integral in (7.214) to 

r T 

x(t) cos(2rt/.r + ft) dr = a-(cosftcos/? ; - sinftsinyft.) 

•i) 

= «• cos ( ft + P^ 

Accordingly, using (7.218) in (7.214), we obtain 


dft 



a • cos ft dft 


where, in the last line, we have used the fact that the definite integral is unaffected by the 
phase P^ 
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From Appendix C on Bessel functions, we recognize the integral of (7.219) as the 
modified Bessel function of zero order, written in the compact form 



expf cos#) d# 

* ‘ J 


Using this formula, we may correspondingly express the likelihood function for the 
signal-detection problem described herein in the compact form 



With binary transmission as the issue of interest, there are two hypotheses to be 
considered: hypothesis 7/j, that signal s ] ( t) was sent, and hypothesis If - that signal s 2 was 
sent. In light of (7.221), the binary-hypothesis test may now be formulated as follows: 



The modified Bessel function /(•) is a monotonically increasing function of its argument. 
Hence, we may simplify the hypothesis test by focusing on a, for given E/N 0 T. For 
convenience of implementation, however, the simplified hypothesis test is carried out in 
terms of a ; rather than «,■; that is to say: 


2 

«1 


?1 2 
5 «2 
h 2 


For obvious reasons, a receiver based on (7.222) is known as the quadratic receiver. In 
light of the definition of a t given in (7.216), the receiver structure for computing a- t is as 
shown in Figure 7.38a. Since the test described in (7.222) is independent of the symbol 
energy E, this hypothesis test is said to be uniformly most powerful with respect to E. 


We next derive two equivalent forms of the quadrature receiver shown in Figure 7.38a. 
The first form is obtained by replacing each correlator in this receiver with a 
corresponding equivalent matched filter. We thus obtain the alternative form of quadrature 
receiver shown in Figure 7.38b. In one branch of this receiver, we have a filter matched to 
the signal cos(2n/,-f) and in the other branch we have a filter matched to sin(27t/ ( r), both of 
which are defined for the signaling interval 0 < t < T. At time t = T, the filter outputs are 
sampled, squared, and then added together. 

To obtain the second equivalent form of the quadrature receiver, suppose we have a fil- 
ter that is matched to s(t) = cos(2nft + 0) for 0 < t < T. The envelope of the matched filter 
output is obviously unaffected by the value of phase 0. Therefore, we may simply choose a 
matched filter with impulse response cos[2ii/ i (T - /)], corresponding to 6= 0. The output 
of such a filter in response to the received signal x(t) is given by 

r T 

y(t) = x(t) cos[2nfj(T - 1 + r)] dr 

J o 

T T 

= cos[27t/-(r- 1)] f x(t) cos(2n/.r) dr- sin [ 2 it /.(T- t)] f x(r) sin(2nf i f) dr 

J 0 J 0 
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sin (27t fit) 


x(t) 



(b) 


x(t)- 


Filter 



matched to 


Envelope 

cos(2ti fit) 


detector 

0<t<T 




Sample 
at 7= T 


(c) 

Noncoherent receivers: (a) quadrature receiver using correlators; 
(b) quadrature receiver using matched fiters; (c) noncoherent matched filter. 


The envelope of the matched filter output is proportional to the square root of the sum of 
the squares of the two definite integrals in (7.223). This envelope, evaluated at time t = T, 
is, therefore, given by the following square root: 



-i2 


cos (2 nfr) dr 


T 

jj - x(r)sin(27t/.r) d 


-,2 


1/2 


But this is just a repeat of the output of the quadrature receiver defined earlier. Therefore, 
the output (at time T) of a filter matched to the signal cos(2n ft + 0) of arbitrary phase 6, 
followed by an envelope detector, is the same as the quadrature receiver’s output /,. This 
form of receiver is shown in Figure 7.38c. The combination of matched filter and envelope 
detector shown in Figure 7.38c is called a noncoherent matched filter. 


404 


Signaling over AWGN Channels 




Output of matched filter for a rectangular RF wave: (a) 6 = 0 ; (b) 6 = 180°. 


The need for an envelope detector following the matched filter in Figure 7.38c may also 
be justified intuitively as follows. The output of a filter matched to a rectangular RF wave 
reaches a positive peak at the sampling instant t = T. If, however, the phase of the filter is 
not matched to that of the signal, the peak may occur at a time different from the sampling 
instant. In actual fact, if the phases differ by 180°, we get a negative peak at the sampling 
instant. Figure 7.39 illustrates the matched filter output for the two limiting conditions: 
(9=0 and 6 = 180° for which the respective waveforms of the matched filter output are 
displayed in parts a and b of the figure. To avoid poor sampling that arises in the absence 
of prior information about the phase, it is reasonable to retain only the envelope of the 
matched filter output, since it is completely independent of the phase mismatch 6. 


Noncoherent Orthogonal Modulation Techniques 


With the noncoherent receiver structures of Figure 7.38 at our disposal, we may now 
proceed to study the noise performance of noncoherent orthogonal modulation that 
includes two noncoherent receivers as special cases: noncoherent binary FSK; and 
differential PSK (called DPSK), which may be viewed as the noncoherent version of 
binary PSK. 

Consider a binary signaling scheme that involves the use of two orthogonal signals Sjft) 
and s 2 (t), which have equal energy. During the signaling interval 0 < t < T, where T may be 
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different from the bit duration 7 h , one of these two signals is sent over an imperfect 
channel that shifts the carrier phase by an unknown amount. Let gj(f) and g 2 (f) denote the 
phase-shifted versions of .v | ( t) and ,v 2 (f) that result from this transmission, respectively. It is 
assumed that the signals gj(f) and g 2 (t) remain orthogonal and have the same energy E, 
regardless of the unknown carrier phase. We refer to such a signaling scheme as 
noncoherent orthogonal modulation , hence the title of the section. 

In addition to carrier-phase uncertainty, the channel also introduces AWGN w(t) of zero 
mean and power spectral density Nq/2, resulting in the received signal 

s±(t) sent for 0 <t<T 
s 2 (t) sent for 0 <t<T 

To tackle the signal detection problem given x(t), we employ the generalized receiver 
shown in Figure 7.39a, which consists of a pair of filters matched to the transmitted 
signals .y | ( t) and s 2 (t). Because the carrier phase is unknown, the receiver relies on 
amplitude as the only possible discriminant. Accordingly, the matched-filter outputs are 
envelope-detected, sampled, and then compared with each other. If the upper path in 
Figure 7.38a has an output amplitude /j greater than the output amplitude / 2 of the lower 
path, the receiver decides in favor of ,v ] ( r) ; the f and / 2 used here should not be confused 
with the symbol / denoting the likelihood function in the preceding section. If the converse 
is true, the receiver decides in favor of s 2 (t). When they are equal, the decision may be 
made by flipping a fair coin (i.e., randomly). In any event, a decision error occurs when 
the matched filter that rejects the signal component of the received signal x(t) has a larger 
output amplitude (due to noise alone) than the matched filter that passes it. 

From the discussion presented in Section 7.10 we note that a noncoherent matched 
filter (constituting the upper or lower path in the receiver of Figure 7.40a), may be viewed 
as being equivalent to a quadrature receiver. The quadrature receiver itself has two 
channels. One version of the quadrature receiver is shown in Figure 7.40b. In the upper 
path, called the in-phase path, the received signal x(t) is correlated with the function 
y/Xt) , which represents a scaled version of the transmitted signal ,V| (f) or s 2 (t) with zero 
carrier phase. In the lower path, called the quadrature path, on the other hand, x(t) is 
correlated with another function y/ft), which represents the version of yfXt) that results 
from shifting the carrier phase by -90°. The signals y/Xt) and (//,( t) are orthogonal to 
each other. 

In actual fact, the signal (//,( t) is the Hilbert transform of (//.(f); the Hilbert transform 
was discussed in Chapter 2. To illustrate the nature of this relationship, let 

y/ft) = m(t) cos(27t ft) 

where m(t) is a band-limited message signal. Typically, the carrier frequency^- is greater than 
the highest frequency component of m(t). Then the Hilbert transform i// (f) is defined by 

V&t) = m(t) sin (2 ti ft) 

for which reference should be made in Table 2.3 of Chapter 2. Since 

cos(^2ft fjt-^ = sin(2jt/.f) 


x(t) = 


gft) + w(t), 
g 2 ( t ) + w(t), 
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we see that y/i(t) is indeed obtained from y/Xt) by shifting the carrier cos(2n/ ( r) by -90°. 
An important property of Hilbert transformation is that a signal and its Hilbert transform are 
orthogonal to each other. Thus, y/Xt) and (//,( t) are indeed orthogonal to each other, as 
already stated. 

The average probability of error for the noncoherent receiver of Figure 7.40a is given 
by the simple formula 


P 


e 



where E is the signal energy per symbol and /V () /2 is the noise spectral density. 


To derive Equation (7.227) we make use of the equivalence depicted in Figure 7.40. In 
particular, we observe that, since the carrier phase is unknown, noise at the output of each 


Sample at 



If / x > l 2 , 
choose s^t). 

If l Y < l 2 , 
choose s 2 (t). 


(a) 


x(t) 



(a) Generalized binary receiver for noncoherent orthogonal modulation, (b) Quadrature 
receiver equivalent to either one of the two matched filters in (a); the index i= 1,2. 
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matched filter in Figure 7.40a has two degrees of freedom: in-phase and quadrature. 
Accordingly, the noncoherent receiver of Figure 7.40a has a total of four noisy parameters 
that are conditionally independent given the phase 6 , and also identically distributed. These 
four noisy parameters have sample values denoted by Xjj, xqj, and x I2 , and xq-,; the first 
two account for degrees of freedom associated with the upper path of Figure 7.40a, and the 
latter two account for degrees of freedom associated with the lower path of the figure. 

The receiver of Figure 7.40a has a symmetric structure, meaning that the probability of 
choosing s 2 (t) given that ,v ] ( 7) was transmitted is the same as the probability of choosing 
.Sj(f) given that s 2 (t) was transmitted. In other words, the average probability of error may 
be obtained by transmitting sft) and calculating the probability of choosing s 2 (t), or vice 
versa; it is assumed that the original binary symbols and therefore s | ( t) and s 2 (t) are 
equiprobable. 

Suppose that signal .Y](/j is transmitted for the interval 0 < t <T. An error occurs if the 
channel noise wit) is such that the output l 2 of the lower path in Figure 7.40a is greater than 
the output /j of the upper path. Then, the receiver decides in favor of s 2 (t) rather than .Y](f). 
To calculate the probability of error so made, we must have the probability density function 
of the random variable L 2 (represented by sample value l 2 ). Since the filter in the lower 
path is matched to s 2 (t) and s 2 (t) is orthogonal to the transmitted signal ,V](f), it follows that 
the output of this matched filter is due to noise alone. Let x I2 and xq 2 denote the in-phase 
and quadrature components of the matched filter output in the lower path of Figure 7.40a. 
Then, from the equivalent structure depicted in this figure, we see that (for i = 2) 


Figure 7.41a shows a geometric interpretation of this relation. The channel noise w(t ) is 
both white (with power spectral density Nq/2) and Gaussian (with zero mean). Corre- 
spondingly, we find that the random variables A I2 and Xq 2 (represented by sample values 
Xj 2 and xq 2 ) are both Gaussian distributed with zero mean and variance Nq/2, given the 
phase 6. Hence, we may write 





*Q2 

(noise) 



x Qi 

(noise) 


*12 

(noise) 


(a) 


(b) 


Geometric interpretations of the two path outputs /j and l 2 
in the generalized non-coherent receiver. 
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and 


fx Q2 ( x QX> 


Mo 


exp 


r 2 
X Q2 


N r 


Next, we use the well-known property presented in Chapter 4 on stochastic processes: the 
envelope of a Gaussian process represented in polar form is Rayleigh distributed and 
independent of the phase 0. For the situation at hand, therefore, we may state that the 
random variable L 2 whose sample value l 2 is related to Xj 2 and xq 2 by (7.228) has the 
following probability density function: 


/l 2 (M 


2/, 

^ XP 


0, 


r 

1 2 
V 


z 2 >0 


elsewhere 


Figure 7.42 shows a plot of this probability density function, where the shaded area 
defines the conditional probability that l 2 > l Hence, we have 


p(/ 2 >/i|Zt) = f/zXy dl 2 

^ i\ ~ - 

Substituting (7.231) into (7.232) and integrating, we get 


P ( / 2 >/ l 


/j) = exp 


N, 


07 


Consider next the output amplitude / 1 , pertaining to the upper path in Figure 7.40a. Since 
the filter in this path is matched to sj(f) and it is assumed that _v j (/) is transmitted, it follows 
that /] is due to signal plus noise. Let jqj and jcqj denote the components at the output of 
the matched filter in the upper path of Figure 7.39a that are in phase and in quadrature 
with respect to the received signal, respectively. Then, from the equivalent structure 
depicted in Figure 7.40b, we see that, for i= 1, 

^1 = J x i 1 + X Q 1 



Calculation of the conditional probability 
that l 2 > l\, given Zj. 


0 
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A geometric interpretation of /,• is presented in Figure 7.41b. Since a Fourier-transformable 
signal and its Hilbert transform form an orthogonal pair, it follows that x n is due to signal 
plus noise, whereas Xq | is due to noise alone. This statement has two implications: 

• The random variable Xu represented by the sample value x n is Gaussian distributed 
with mean Je and variance Nq/2, where E is the signal energy per symbol. 

• The random variable Aq| represented by the sample value xqj is Gaussian distrib- 
uted with zero mean and variance Nq/2. 

Hence, we may express the probability density functions of these two independent random 
variables as 


respectively. Since the two random variables X n and Aqj are statistically independent, 
their joint probability density function is simply the product of the probability density 
functions given in (7.235) and (7.236). 

To find the average probability of error, we have to average the conditional probability 
of error given in (7.233) over all possible values of /j. Naturally, this calculation requires 
knowledge of the probability density function of random variables Lj represented by 
sample value Zj. The standard method is now to combine (7.235) and (7.236) to find the 
probability density function of Lj due to signal plus noise. However, this leads to rather 
complicated calculations involving the use of Bessel functions. This analytic difficulty 
may be circumvented by the following approach. Given x M and Xqj, an error occurs when, 
in Figure 7.40a, the lower path’s output amplitude Z 2 due to noise alone exceeds Zj due to 
signal plus noise; squaring both sides of (7.234), we write 


The probability of the occurrence just described is obtained by substituting (7.237) into 


which is a probability of error conditioned on the output of the matched filter in the upper path 
of Figure 7.40a taking on the sample values x ( | and xqj. This conditional probability multi- 
plied by the joint probability density function of the random variables Ajj and Aqj is the 
error-density given x n and xq|. Since and Aqj are statistically independent, their joint 
probability density function equals the product of their individual probability density func- 
tions. The resulting error-density is a complicated expression in x n and xqj. However, the 
average probability of error, which is the issue of interest, may be obtained in a relatively sim- 
ple manner. We first use (7.234), (7.235), and (7.236) to evaluate the desired error-density as 



and 



,2 2 2 

1 ~ X U +X Q1 


(7.233): 



2 2 



410 


Signaling over AWGN Channels 


Completing the square in the exponent of (7.239) without the scaling factor -1/)V 0 , we 
may rewrite it as follows: 


'll 


+ .v' l +(x 


II 


■ Je) 2 +x 2 


Qi 


l n 


2 J 


2 F 

+ 2x n , + - 
Ql 2 


Next, we substitute (7.240) into (7.239) and integrate the error-density over all possible 
values of xjj and xqj, thereby obtaining the average probability of error: 


00 00 

/J e = f J P(error|x n , x Ql )f Xu (x n )f x (x Ql ) dx n dx Ql 

—00 —00 ^ 


0 -a> 


' 2-4 

v 


d.r 


Ql 


We now use the following two identities: 





and 


^2x 2 ^ 


exp - 


*Q1 


K N 0 2 


dx, 


Ql 



The identity of (7.242) is obtained by considering a Gaussian-distributed variable with 
mean J~E / 2 and variance Nq/4 and recognizing the fact that the total area under the curve 
of a random variable’s probability density function is unity. The identity of (7.243) follows 
as a special case of (7.242). Thus, in light of these two identities, (7.241) reduces to 



which is the desired result presented previously as (7.227). With this formula at our 
disposal, we are ready to consider noncoherent binary FSK and DPSK as special cases, 
which we do next in that order. 


Binary Frequency-Shift Keying Using Noncoherent Detection 


In binary FSK, the transmitted signal is defined in (7.151) and repeated here for 
convenience of presentation: 


s ,(0 = ■ 


-COS (271 f-t). 


l 0, 


0<t<T b 

elsewhere 


where 7 h is the bit duration and the carrier frequency equals one of two possible values 
fl and/ 2 ; t0 ensure that the signals representing these two frequencies are orthogonal, we 
choose /• = n-JT b , where is an integer. The transmission of frequency / represents 
symbol 1 and the transmission of frequency f 2 represents symbol 0. For the noncoherent 
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M 



If > l 2 , 
choose 1. 

If fj < l 2 , 
choose 0. 


Noncoherent receiver for the detection of binary FSK signals. 


detection of this frequency-modulated signal, the receiver consists of a pair of matched 
filters followed by envelope detectors, as in Figure 7.43. The filter in the upper path of the 
receiver is matched to cos^Tt/jf) and the filter in the lower path is matched to cos(27i/ 2 t ) 
for the signaling interval 0 < t < Tj,. The resulting envelope detector outputs are sampled at 
t = and their values are compared. The envelope samples of the upper and lower paths 
in Figure 7.43 are shown as /j and / 2 . The receiver decides in favor of symbol 1 if /j > Z 2 
and in favor of symbol 0 if h < h- If /] = / 2 , the receiver simply guesses randomly in favor 
of symbol 1 or 0. 

The noncoherent binary FSK described herein is a special case of noncoherent 
orthogonal modulation with T = T b and E = E/,, where E h is the signal energy per bit. 
Hence, the BERfor noncoherent binary FSK is 



which follows directly from (7.227) as a special case of noncoherent orthogonal 
modulation. 


Differential Phase-Shift Keying 


As remarked at the beginning of Section 7.9, we may view DPSK as the “noncoherent” 
version of binary PSK. The distinguishing feature of DPSK is that it eliminates the need 
for synchronizing the receiver to the transmitter by combining two basic operations at the 
transmitter: 

• differential encoding of the input binary sequence and 

• PSK of the encoded sequence, 

from which the name of this new binary signaling scheme follows. 
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Differential encoding starts with an arbitrary first bit, serving as the reference bit; to 
this end, symbol 1 is used as the reference bit. Generation of the differentially encoded 
sequence then proceeds in accordance with a two-part encoding rule as follows: 

If the new bit at the transmitter input is 1, leave the differentially encoded symbol 
unchanged with respect to the current bit. 

If, on the other hand, the input bit is 0, change the differentially encoded symbol 
with respect to the current bit. 

The differentially encoded sequence, denoted by {<4), is used to shift the sinusoidal 
carrier phase by zero and 180°, representing symbols 1 and 0, respectively. Thus, in terms 
of phase-shifts, the resulting DPSK signal follows the two-part rule: 

To send symbol 1 , the phase of the DPSK signal remains unchanged. 

To send symbol 0, the phase of the DPSK signal is shifted by 180°. 


Illustration of DPSK 

Consider the input binary sequence, denoted j b k }, to be 10010011, which is used to 
derive the generation of a DPSK signal. The differentially encoded process starts with the 
reference bit 1. Let {d k } denote the differentially encoded sequence starting in this 
manner and { d k , } denote its delayed version by one bit. The complement of the 
modulo-2 sum of { b k } and { d k i } defines the desired { d , } , as illustrated in the top three 
lines of Table 7.6. In the last line of this table, binary symbols 1 and 0 are represented by 
phase-shifts of 1 and n radians. 

Illustrating the generation of DPSK signal 


{**} 

1 

0 

0 

1 

0 

0 

1 

1 

{4-iJ 

1 

reference 

1 

0 

1 

1 

0 

1 

1 

Differentially encoded sequence {d^} 

1 1 

0 

1 

1 

0 

1 

1 

1 

Transmitted phase (radians) 

0 0 

n 

0 

0 

n 

0 

0 

0 


Basically, the DPSK is also an example of noncoherent orthogonal modulation when its 
behavior is considered over successive two-bit intervals; that is, 0 < t < 2 T b . To 
elaborate, let the transmitted DPSK signal be j2E b /T b cos ( 2nf c t ) for the first-bit 
interval 0 < t < T b , which corresponds to symbol 1 . Suppose, then, the input symbol for 
the second-bit interval T b < t < 2 T b is also symbol 1 . According to part 1 of the DPSK 
encoding rule, the carrier phase remains unchanged, thereby yielding the DPSK signal 
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— -cos(2n f c t), symbol 1 for 0 < t < T b 

Tu 


— -cos(2nf c t), symbol 0 for T b <t<2T b 

T u 


Suppose, next, the signaling over the two-bit interval changes such that the symbol at the 
transmitter input for the second-bit interval T b < t < 2 T b is 0. Then, according to part 2 of 
the DPSK encoding rule, the carrier phase is shifted by n radians (i.e., 180°), thereby 
yielding the new DPSK signal 


We now readily see from (7.246) and (7.247) that Si(t) and s 2 (0 are indeed orthogonal 
over the two-bit interval 0 <t< 2T b , which confirms that DPSK is indeed a special form of 
noncoherent orthogonal modulation with one difference compared with the case of binary 
FSK: for DPSK, we have T = 2T b and E = 2 E b . Hence, using (7.227), we find that the BER 
for DPSK is given by 


According to this formula, DPSK provides a gain of 3 dB over binary FSK using 
noncoherent detection for the same E b /N 0 . 


Figure 7.44 shows the block diagram of the DPSK transmitter. To be specific, the 
transmitter consists of two functional blocks: 

• Logic network and one-bit delay (storage) element , which are interconnected so as 
to convert the raw input binary sequence { b [. } into the differentially encoded 
sequence {d^}. 

• Binary PSK modulator, the output of which is the desired DPSK signal. 


In the use of DPSK, the carrier phase 0 is unknown, which complicates the received signal 
x(t). To deal with the unknown phase 9 in the differentially coherent detection of the 
DPSK signal in x(t), we equip the receiver with an in-phase and a quadrature path. We thus 
have a signal-space diagram where the received signal points over the two-bit interval 




symbol 1 for 0 <t<T b 
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Binary PSK modulator 


Input 

binary 

sequence 

{b k } 



DPSK 

signal 


Block diagram of a DPSK transmitter. 


0 < t < 2T b are defined by (A cos ft A sin ft) and (-Acosft -A sin ft), where A denotes the 
carrier amplitude. 

This geometry of possible signals is illustrated in Figure 7.45. For the two-bit interval 
0 < t < 2Tu, the receiver measures the coordinates x. , x M , first, at time t = T b and then 

Vo 

measures Xj , Xq at time t = 2T b . The issue to be resolved is whether these two points map 
to the same signal point or different ones. Recognizing that the vectors x 0 and x ]? with end 
points Xj , Xq and x 1 , Xq , respectively, are points roughly in the same direction if their 
inner product is positive, we may formulate the binary-hypothesis test with a question: 


Expressing this statement in analytic terms, we may write 


x, x 

1 o 


i, 


say 1 


+ X Q 0 X Q, 


say 0 


0 


where the threshold is zero for equiprobable symbols. 
We now note the following identity: 


Vi, +x Qo X Qi = J ((x io + x ii ) 2 _(x io _x ii )2 + (x Qo + x Qi ) 2 ' (x Qo"' Y Qi )2) 



Signal-space diagram of received DPSK signal. 
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x(t) 



Say 1 if y > 0 
Say 0 if y < 0 


Quadrature channel 

Block diagram of a DPSK receiver. 


Hence, substituting this identity into (7.249), we get the equivalent test: 

o 22 2 sa y ^ 

(Xio + X^) +(xq o + x Q i ) -(x Qo -x Q i ) I 0 

say 0 

where the scaling factor 1/4 is ignored. In light of this equation, the question on the binary 
hypothesis test for the detection of DPSK may now be restated as follows: 


Thus, the optimum receiver for the detection of binary DPSK is as shown in Figure 7.46, 
the formulation of which follows directly from the binary hypothesis test of (7.250). This 
implementation is simple, in that it merely requires that sample values be stored. 

The receiver of Figure 7.46 is said to be optimum for two reasons: 

In structural terms, the receiver avoids the use of fancy delay lines that could be 
needed otherwise. 

In operational terms, the receiver makes the decoding analysis straightforward to 
handle, in that the two signals to be considered are orthogonal over the interval 
[0,27’ h ] in accordance with the formula of (7.227). 


BER Comparison of Signaling Schemes over AWGN Channels 


Much of the material covered in this chapter has been devoted to digital modulation 
schemes operating over AWGN channels. In this section, we present a summary of the 
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BERs of some popular digital modulation schemes, classified into two categories, 
depending on the method of detection used in the receiver: 

Class I: Coherent detection 

• binary PSK: two symbols, single carrier 

• binary FSK: two symbols, two carriers one for each symbol 

• QPSK: four symbols, single carrier — the QPSK also includes the QAM, employing 
four symbols as a special case 

• MSK: four symbols, two carriers. 

Class II: Noncoherent detection 

• DPSK: two symbols, single carrier 

• binary FSK: two symbols, two carriers. 

Table 7.7 presents a summary of the formulas of the BERs of these schemes separated 
under Classes 1 and II. All the formulas are defined in terms of the ratio of energy per bit 
to the noise spectral density, E^/Nq, as summarized herein: 

Under Class I, the formulas are expressed in terms of the Q-function. This function 
is defined as the area under the tail end of the standard Gaussian distribution with 
zero mean and unit variance; the lower limit in the integral defining the ^-function 
is dependent solely on E b IN$, scaled by the factor 2 for binary PSK, QPSK, and 
MSK. Naturally, as this SNR ratio is increased, the area under the 0-function is 
reduced and with it the BER is correspondingly reduced. 

Under Class II, the formulas are expressed in terms of an exponential function, 
where the negative exponent depends on the E b /No ratio for DPSK and its scaled 
version by the factor 1/2 for binary FSK. Here again, as the E b /N^ is increased, the 
BER is correspondingly reduced. 

The performance curves of the digital modulation schemes listed in Table 7.7 are shown in 
Figure 7.47 where the BER is plotted versus E b /N 0 . As expected, the BERs for all the 


Formulas for the BER of digital modulation schemes 
employing two or four symbols 


Binary PSK 

QPSK Qj2E^/N~ 0 

I. Coherent detection MSK 


Binary FSK QjE b /N 0 


DPSK 

^exp (~E b /N 0 ) 

Binary FSK 

iexp(-£ b /2A 0 ) 
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Comparison of the noise performance of different PSK and FSK schemes. 


schemes decrease monotonically with increasing E^/Nq, with all the graphs having a 
similar shape in the form of a waterfall. Moreover, we can make the following 
observations from Figure 7.47: 

For any value of E^/Nq, the schemes using coherent detection produce a smaller 
BER than those using noncoherent detection, which is intuitively satisfying. 

PSK schemes employing two symbols, namely binary PSK with coherent detection 
and DPSK with noncoherent detection, require an E b /N 0 that is 3 dB less than their 
FSK counterpart to realize the same BER. 

At high values of E^/Nq, DPSK and binary FSK using noncoherent detection 
perform almost as well, to within about 1 dB of their respective counterparts using 
coherent detection for the same BER. 
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Although under Class I the BER for binary PSK, QPSK, and MSK is governed by 
the same formula, there are important differences between them: 

• For the same channel bandwidth and BER, the QPSK accommodates the 
transmission of binary data at twice the rate attainable with binary PSK; in other 
words, QPSK is bandwidth conserving. 

• When sensitivity to interfering signals is an issue of practical concern, as in 
wireless communications, MSK is preferred over QPSK. 


Synchronization 


The coherent reception of a digitally modulated signal, discussed in previous sections of 
this chapter, requires that the receiver be synchronous with the transmitter. In this context, 
we define the process of synchronization as follows: 


There are two basic modes of synchronization: 

Carrier synchronization. When coherent detection is used in signaling over AWGN 
channels via the modulation of a sinusoidal carrier, knowledge of both the frequency 
and phase of the carrier is necessary. The process of estimating the carrier phase and 
frequency is called carrier recovery or carrier synchronization ; in what follows, 
both terminologies are used interchangeably. 

To perform demodulation, the receiver has to know the instants of time at which the 
modulation in the transmitter changes its state. That is, the receiver has to know the 
starting and finishing times of the individual symbols, so that it may determine when 
to sample and when to quench the product-integrators. The estimation of these times is 
called clock recovery or symbol synchronization ; here again, both terminologies are 
used interchangeably. 

We may classify synchronization schemes as follows, depending on whether some form of 
aiding is used or not: 

Data-aidedl synchronization. In data-aided synchronization schemes, a preamble is 
transmitted along with the data-bearing signal in a time-multiplexed manner on a 
periodic basis. The preamble contains information about the symbol timing, which 
is extracted by appropriate processing of the channel output at the receiver. Such an 
approach is commonly used in digital satellite and wireless communications, where 
the motivation is to minimize the time required to synchronize the receiver to the 
transmitter. Limitations of data-aided synchronization are twofold: 

• reduced data-throughput efficiency, which is incurred by assigning a certain 
portion of each transmitted frame to the preamble, and 

• reduced power efficiency, which results from the allocation of a certain fraction 
of the transmitted power to the transmission of the preamble. 


Recursive Maximum Likelihood Estimation for Synchronization 


419 


Nondata-aided synchronization. In this second approach, the use of a preamble is 
avoided and the receiver has the task of establishing synchronization by extracting 
the necessary information from the noisy distorted modulated signal at the channel 
output. Both throughput and power efficiency are thereby improved, but at the 
expense of an increase in the time taken to establish synchronization. 

In this section, the discussion is focused on nondata-aided forms of carrier and clock 
recovery schemes. To be more specific, we adopt an algorithmic approach, which is so- 
called on account of the fact that implementation of the sychronizer enables the receiver to 
estimate the carrier phase and symbol timing in a recursive manner from one time instant 
to another. The processing is performed on the baseband version of the received signal, 
using discrete-time (digital) signal-processing algorithms. 


Maximum likelihood decoding played a key role in much of the material on signaling 
techniques in AWGN channels presented in Sections 7.4 through 7.13. Maximum 
likelihood parameter estimation plays a key role of its own in the algorithmic approach to 
synchronization. Both of these methods were discussed previously in Chapter 3 on 
probability theory and Bayesian inference. In this context, it may therefore be said that a 
sense of continuity is being maintained throughout this chapter. 

Given the received signal, the maximum likelihood method is used to estimate two 
parameters: carrier phase and symbol timing, both of which are, of course, unknown. 
Here, we are assuming that knowledge of the carrier frequency is available at the receiver. 

Moreover, in the algorithmic approach, the symbol-timing recovery is performed 
before phase recovery. The rationale for proceeding in this way is that once we know the 
envelope delay incurred by signal transmission through a dispersive channel, then one 
sample per symbol at the matched filter output may be sufficient for estimating the 
unknown carrier phase. Moreover, computational complexity of the receiver is minimized 
by using synchronization algorithms that operate at the symbol rate 1/T. 

In light of the remarks just made, we will develop the algorithmic approach to 
synchronization by proceeding as follows: 

Through processing the received signal corrupted by channel noise and channel 
dispersion, the likelihood function is formulated. 

The likelihood function is maximized to recover the clock. 

With clock recovery achieved, the next step is to maximize the likelihood function to 
recover the carrier. 

The derivations presented in this chapter focus on the QPSK signal. The resulting 
formulas may be readily extended to binary PSK symbols as a special case and 
generalized for M- ary PSK signals. 

Recursive Maximum Likelihood Estimation for Synchronization 


In the previous section, we remarked that, in algorithmic synchronization, estimation of 
the two unknown parameters, namely carrier phase and symbol timing, is performed in a 
recursive manner from one time instant to another. 
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In other words: 


Moreover, the estimation is performed at time t = nT, where n is an integer and T is the 
symbol duration. Equivalently, we may say that n = t/T denotes the normalized 
( dimensionless ) discrete time. 

One other important point to note: recursive estimation of the unknown parameter, be 
that the carrier phase or symbol time, plays a key role in the synchronization process. 
Specifically, it proceeds across discrete time in accordance with the following rule: 

f Updated estimated _ ( Old estimate 'j f Step-size ) f Error ^ 

V of the parameter ) Vof the parameter/ /parameter/ V signal / 

In other words, the recursive parameter estimation takes on the structure of an adaptive 
filtering algorithm, in which the product of the step-size parameter and error signal 
assumes the role of an algorithmic adjustment. 

In what follows, we derive adaptive filtering algorithms for estimating the unknown 
synchronization parameters with the error signal being derived from the likelihood function. 


The idea of maximum likelihood parameter estimation based on continuous-time 
waveforms was discussed in Chapter 3. To briefly review the material described therein, 
consider a baseband signal defined by 

x(t) = s( t, A) + w(t) 

where A is an unknown parameter and w(t) denotes an AWGN. Given a sample of the 
signal x(t), the requirement is to estimate the parameter A; so, we say: 


Note that we say “a maximum” rather than “the maximum” because it is possible for the 
graph of 1(A) plotted versus A to have multiple maxima. In any event, the likelihood 
function given x, namely 1(A), is defined as the probability density function f(x\A) with 
the roles of x and A interchanged, as shown by 

1(A) = f(x\A ) 

where, for convenience of presentation, we have omitted the conditional dependence of A 
on x in 1(A). 

In the algorithmic synchronization procedures derived in this section, we will be 
concerned only with cases in which the parameter A is a scalar. Such cases are referred to 
as independent estimation. However, when we are confronted with the synchronization of a 
digital communication receiver to its transmitter operating over a dispersive channel , we 
have two unknown channel-related parameters to deal with: the phase (carrier) delay r, and 
the group (envelope) delay r , both of which were discussed in Chapter 2. In the context of 
these two parameters, when we speak of independent estimation for synchronization, we 
mean that the two parameters t c and r a are considered individually rather than jointly. 
Intuitively speaking, independent estimation is much easier to tackle and visualize than 
joint estimation, and it may yield more robust estimates in general. 
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Let the transmitted signal for symbol i in the QPSK signal be defined by 


2E 


= k \— eos(27t/ c f+ a 2 ). 


0 <t<T 


where E is the signal energy per symbol, T is the symbol period, and a i is the carrier 
phase used for transmitting symbol i. For example, for the QPSK we have 


a,. = |(2i-l), i = 1,2, 3,4 

Equivalently, we may write 

«/(0 = JyCos(2jt/ c r+ a.)g(0 

where g(t ) is the shaping pulse, namely a rectangular pulse of unit amplitude and duration 
T. By definition, r c affects the carrier and r„ affects the envelope. Accordingly, the 
received signal at the channel output is given by 

Itf 

x(t) = JyCos(27t/ c (f- r c ) + a l .)g(f- r g ) + w(f) 

= JyCos(27t/ c f+ 6»+ a.)g(t- r g ) + w(0 

where vv(r) is the channel noise. The new term 6 introduced in (7.254) is an additive 
carrier phase attributed to the phase delay r c produced by the dispersive channel; it is 
defined by 

0 = - 2nf c t c 

The minus sign is included in the right-hand side of (7.255) to be consistent with previous 
notation used in dealing with signal detection. 

Both the carrier phase 6 and group delay r g are unknown. However, it is assumed that 
they remain essentially constant over the observation interval 0 < t < Tq or through the 
transmission of a sequence made up of L 0 = T 0 /T symbols. 

With 0 used to account for the carrier delay t c , we may simplify matters by using r 
in place of for the group delay; that is, (7.254) is rewritten as 

Of 

x(t) = — COS(27l/ c r+ 6+ Ctj)g{t- t) + w(t), T<t<T+T 

i = 1, 2, 3, 4 


At the receiver, the orthogonal pair of basis functions for QPSK signals is defined by 

= J c °sW c 0, r<t<T+r 

</ 2 ( t ) = J|sin(27t/ c O, r<t<T+r 

Here, it is assumed that the receiver has perfect knowledge of the carrier frequency / c , 
which is a reasonable assumption; otherwise, a carrier-frequency offset has to be included 
that will complicate the analysis. 
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Accordingly, we may represent the received signal x(t ) by the baseband vector 

x(r) = 

where 

T + t 

x k (r) = J x(t)<i k (t) dt, k= 1,2 


xj(r) 

x 2 ( T) 


In a corresponding fashion, we may express the signal component of x( r) by the vector 

s x (a t , 0 t) 


where 


T + r 


S 0 T) 


s 2 (a k 0 t) 


s k (a p 6 , t) = J Jy cos(2ji/ c r+ 6+ a^<j> k (t) dr, k = 1, 2 

/ = 1, 2, 3, 4 


Assuming that/ c is an integer multiple of the symbol rate 1/7", evaluation of the integral in 
(7.262) shows that dependence of ,V| and s 2 on the group delay r is eliminated, as shown 
by 

0) = Je cos(0+a i ) 

s 2 (ccp 0) = -jEsm(0+ 

We may thus expand on (7.259) to write 

x(r) = s («•, #) + w(r), i = 1, 2, 3, 4 

where 


w(r) = 


W \( T ) 

w 2 (r) 


The two elements of the noise vector w are themselves defined by 



T 


The \v k in (7.267) is the sample value of a Gaussian random variable W of zero mean and 
variance Nq/2, where Nq/2 is the power spectral density of the channel noise w(t). 
Dependence of the baseband signal vector x on delay r is inherited from (7.265). 

The conditional probability density function of the random vector X, represented by the 
sample x at the receiver input given transmission of the /th symbol, and occurrence of the 
carrier phase 0 and group delay r resulting from the dispersive channel, is defined by 

/ X (x|«,, 0, t) = ^-exp(-^-|x(r)-s(«., £)|| 2 ) 
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Setting s( cr) equal to zero, (7.268) reduces to 

/ X (x|s = °) = ^ ex p(-^ ||x ( r )H 2 ) 


Equation (7.268) defines the probability density function of the random vector X in the 
combined presence of signal and channel noise, whereas (7.269) defines the probability 
density function of x in the presence of channel noise acting alone. Accordingly, we may 
define the likelihood function for QPSK as the ratio of these two probability density 
functions, as shown by 


0, T) 


/ X (x|«., 9, T) 

/ X (x|s = 0 ) 

ex pf^-x T (r)s(« ( ., 9) - T^|s(a ( ., £)|| 2 ) 

vv o ^ v o 


In QPSK, we have 


|s(« ; , 9 ) | = constant 


because all four message points lie on a circle of radius Je. Hence, ignoring the second 
term in the exponent in (7.270), we may reduce the likelihood function to 


K «/, 0, r) = exp( i J-x T (r)s(«., 9) 


Before proceeding with the derivations of adaptive filtering algorithms for recovery of the 
clock and carrier, we find it instructive to reformulate the likelihood function of (7.271) 
using complex terminology. Such a step is apropos given the fact that the received signal 
vector as well as its contituent signal and noise vectors in (7.265) are all in their respective 
baseband forms. 

Specifically, the two-dimensional vector x(r) is represented by the complex envelope 
of the received signal 

x(r) = x l +jx 0 

where j = . 

Correspondingly, the signal vector s( a p 9 ) , comprising the pair of signal components 
Sj(a-, 9) and s^a^ 9 ) , is represented by the complex envelope of the transmitter signal 
corrupted by carrier phase 9: 

~s(a p 9) = s l (a i ,9)+]s 2 (a i ,9) 

= VE[cos(«., 9) + j sin(«., 9)\ 

= jEa -e-i 0 , i = 1, 2, 3, 4 

The new complex parameter a l in (7.273) is a symbol indicator in the message 
constellation of the QPSK; it is defined by 



= cos a ■ + j sin a- 

l J l 


424 


Signaling over AWGN Channels 


Correspondingly, the complex experimental factor embodying the carrier phase 8 is 
defined by 

e J = cos# + jsin# 

Both (7.274) and (7.275) follow from Euler’s formula. 

With the complex representations of (7.272) to (7.275) at hand, we may now reformulate 
the exponent of the likelihood function in (7.271) in the equivalent complex form: 

77 -x T s («,, 8) = ^ Re[x ( .(r)p(« ( , 8)] 

° -VW'. 

JV o 

where Re[.] denotes the real part of the complex expression inside the square brackets. 
Hence, we may make the following statement: 


Two points are noteworthy here: 

The complex envelope of the received signal is dependent on the group delay r, 
~ i 0 

hence x(r). The product a-e is made up of the complex symbol indicator re- 
attributed to the QPSK signal generated in the transmitter and the exponential term 
\0 

e attributed to phase distortion in the channel. 

~ i 0 

In complex variable theory, given a pair of complex terms x(r) and a j e > , their 
inner product could be defined as ;c(r)(ar,-e ) = x( z) a t e , as shown in (7.276). 

The complex representation on the right-hand side of (7.276), expressed in Cartesian 
form, is well suited for estimating the unknown phase 8. On the other hand, for estimating 
the unknown group delay r, we find it more convenient to use a polar representation for 
the inner product of the two vectors x( r) and s (a p 8 ) , as shown by 

J-x T (r)s(« ; -, 8) = |«,x(r)| cos(arg[x(r)] -arg[«-] - 8) 

^ 0 ' v o 

Indeed, it is a straightforward matter to show that the two complex representations on the 
right-hand side of (7.276) and (7.277) are indeed equivalent. The reasons for why these 
two representations befit the estimation of carrier phase 8 and group delay r, respec- 
tively, will become apparent in the next two subsections. 

Moreover, in light of what was said previously, estimation of the group delay should 
precede that of the carrier phase. Accordingly, the next subsection is devoted to group- 
delay estimation, followed by the sub-section devoted to carrier-phase estimation. 


To begin the task of estimating the unknown group delay, first of all we have to remove 
dependence of the likelihood function /(« ; -, a. r) on the unknown carrier phase 8 in 
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(7.271). To do this, we will average the likelihood function over all possible values of 6 
inside the range [0, 2n] . To this end, 0 is assumed to be uniformly distributed inside this 
range, as shown by 


f&(0) = 


J_ 

2n 

0, 


0 < 9< 2n 
otherwise 


which is the worst possible situation that can arise in practice. Under this assumption, we 
may thus express the average likelihood function as 


,271 


l w (au t) = [ 
J o 

/(«-, e, z)f & (6) d e 

2n 

2n 

[ l(a t , 0, r) d0 

J o 

J_ 

2n 

I exp C -J" X T ( Z-) s ( « ff)j dO 

J o V7V o y 


where, in the last line, we used (7.271). 

Examining the two alternative complex representations of the likelihood function’s 
exponent given in (7.276) and (7.277), it is the latter that best suits solving the integration 
in (7.279). Specifically, we may write 


1 r 2n 

^ T) = ( 


exp 


2a/E \ 


cos(arg[.v(r)] - arg[«] - 6) 


d6 


j 2 ji - arg [x( r)] + arg [ a t ] ^ Jjr 

~ 6X P( — 


arg[.r( r)] + arg[/] 


V N n 


|a-x(r)| cos(^)d^r 


where, in the last line, we have made the substitution 

(p = arg[x(r)] - arg[« ( ] - 0 

We now invoke the definition of the modified Bessel function of zero order , as shown by 
(see Appendix C) 

i -2 ti 

t / \ 1 f *COS (p 

7 o (x) = ^J 0 e d( P 

Using this formula, we may, therefore, express the average likelihood function /, lv ( a r f) 
in (7.280) as follows: 

KX a v *■) = / o(^l^t(")|) 

where xXr) is the complex envelope of the matched filter output in the receiver. By 
definition, for QPSK we have 

«■] = 1, for all / 
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It follows, therefore, that (7.282) reduces to 



Here, it is important to note that, as a result of averaging the likelihood function over the 
carrier phase 0, we have also removed dependence on the transmitted symbol a i for 
QPSK; this result is intuitively satisfying. 

In any event, taking the natural logarithm of / av (r) in (7.283) to obtain the log- 
likelihood function of r, we write 

L av ( T ) = ln/ avM 



where In denotes the natural logarithm. To proceed further, we need to find a good 
approximation for L av ( t). To this end, we first note that the modified Bessel function Iq(x) 
may itself be expanded in a power series (see Appendix C): 


I Q (x) = 


I 

m = 0 



(ra !) 2 


where x stands for the product term '2 Je/(N 0 ) |x( r)| . For small values of x, we may thus 
approximate 7 0 (x) as shown by 




We may further simplify matters by using the approximation 

In I Q (x) ~ Inf I + 

2 

X 

« — for small x 
4 

For the problem at hand, small x corresponds to small SNR. Under this condition, we may 
now approximate the log-likelihood function of (7.284) as follows: 


% 

With maximization of L av (r) as the objective, we differentiate it with respect to the 
envelope delay r, obtaining 

^ - - 2 fm 2 

dr A T 2 a v 1 


N\ 


o 


= Re[x*(r)?(r)] 

N o 

where x*(t) is the complex conjugate of x( r) and x'( r) is its derivative with respect to r. 
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The formula in (7.287) is the result of operating on the received signal at the channel 
output, x(t) , defined in (7.254) for a particular symbol of the QPSK signal defined in the 
interval [t , T + r |. In the course of finding the baseband vector representation of the 
received signal, namely x(f) , dependence on time t disappeared in (7.287). 
Notwithstanding this point, the fact of the matter is the log-likelihood ratio L ay (r) in 
(7.287) pertains to some point in discrete time n = t/T, and it changes with n. To go forward 
with recursive estimation of the group delay r , we must therefore bring discrete time n into 
the procedure. To this end, n is assigned as a subscript to both x (f) and x'(t) in (7.287). 
Thus, with the recursive estimation of r following the format described in words in (7.251), 
we may define the error signal needed for the recursive estimation of r (i.e., symbol-timing 
recovery) as follows: 

e n = Re[.i*(r)i;(r)] 

Let r n denote the estimate of the unknown group delay r at discrete time n. 
Correspondingly, we may introduce two definitions 

x„(t) = x(nT+r n ) 

and 

x' n (T) = x'(nT + T n ) 

Accordingly, we may reformulate the error signal e„ in (7.288) as follows: 
e n = Re[x*(nT + T n )x'(nT + T n )] 


Computation of the error signal e n , therefore, requires the use of two filters: 

Complex matched filter, which is used for generating x n (r) . 

Complex derivative matched filter, which is used for generating x (t) . 

By design, the receiver is already equipped with the first filter. The second one is new. In 
practice, the additional computational complexity due to the derivative matched filter is 
found to be an undesireable requirement. To dispense with the need for it, we propose to 
approximate the derivative using a finite difference, as shown by 


x\nT + z n )«- 


x{n T +^ + T n+m 


-~x{n T ~+T n -in 


Note, however, that in using the finite-difference approximation of (7.292) we have 
simplified computation of the derivative matched filter by doubling the symbol rate. It is 
desirable to make one further modification to account for the fact that timing estimates are 
updated at multiples of the symbol period T and the only available quantities are r n . 
Consequently, we replace t - ,, +1/2 by the current (updated estimate) r n and replace 1/2 
by the old estimate T n _\. We may thus rewrite (7.292) as follows: 


x'(nT + r n ) 



nT + 



— x\ nT — 



So, we finally redefine the error signal as follows: 


e n = Re i x ( nT + T n ) 


x[ nT +- + T n -x{ nT--+T n _ { 


where the scaling factor 1/Tis accounted for in what follows. 
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Finally, building on the format of the recursive estimation procedure described in 
(7.251), we may formulate the adaptive filtering algorithm for symbol timing recovery: 


where we have the following: 

0 

• The yin (7.295) is the step-size parameter, in which the two scaling factors 2 E/Nq 
and 1/r are absorbed; the factor 2E/N 0 was ignored in moving from (7.287) to 
(7.288) and the factor 1 IT was ignored from (7.293) to (7.294). 

• The error signal e n is defined by (7.294). 

• The c n is a real number employed as control for the frequency of an oscillator, 
referred to as a number-controlled oscillator (NCO). 

The closed-loop feedback system for implementing the timing-recovery algorithm of 
(7.295) is shown in Figure 7.48. From a historical perspective, the scheme shown in this 
figure is analogous to the continuous-time version of the traditional early-late gate 
synchronizer widely used for timing recovery. In light of this analogy, the scheme of 
Figure 7.48 is referred to as a recursive early-late delay (NDA-ELD) synchronizer. At 
every recursion (i.e., time step), the synchronizer works on three successive samples of the 
matched filter output, namely: 


The first sample is early and the last one is late , both defined with respect to the middle one. 


With estimation of the symbol time z taken care of, the next step is to estimate the carrier 
phase 6 . This estimation is also based on the likelihood function defined in (7.270), but 





Sample at 
t = nT+r n 


Error 

detector 


NCO 



r 


J 


Loop filter 

Nondata-aided early-late delay 
synchronizer for estimating the group delay. 
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with a difference: this time we use the complex representation on the right-hand side of 
(7.276) for the likelihood function’s exponent. Thus, the likelihood function of 9 is now 
expressed as follows: 


1(0) = expf = ^^Re[.Y (r)a* e J 0 ] 
V A'n 


Taking the natural logorithm of both sides of (7.296), the log-likelihood function of 6 is, 
therefore, given by 

L(9) = ^Re[i(r)«; e^ 
yv 0 

Here again, maximizing the estimate of the carrier phase 0 as the issue of interest, we 
differentiate L(9) with respect to 9, obtaining 


dL(9 ) = 2 JE 8 

80 N 0 80 


Re[.v« ( * e i (> ] 


The real-part operator Re[ ] is linear; therefore, we may interchange this operation with 
the differentiation. Moreover, we have 


As a result of the differentiation, the argument .r(r)« ( * e “J (> in (7.297) is multiplied by -j, 
which, in turn, has the effect of replacing the real-part operator Re[.] by the corresponding 
imaginary-part operator Im[.] Accordingly, we may express derivative of the log-likelihood 
function in (7.297) with respect to 9 as follows: 


8L(0) _ 2 JE 


89 


N n 


Im [x(t)cc* e -i 0 ] 


With this equation at hand, we are now ready to formulate the adaptive filtering algorithm 
for estimating the unknown carrier phase 9 . To this end, we incorporate discrete-time n 
into the recursive estimation procedure for clock recovery in a manner similar to what we 
did for the group delay; specifically: 

With the argument of the imaginary-part operator in (7.298) playing the role of error 
signal, we write: 

e n = Im [^„(r)«„*e _j6 '"] 
where n denotes the normalized discrete-time. 

The scaling factor lJ~E/N () is absorbed in the new step-size parameter // . 

With 9„ denoting the old estimate of the carrier phase 9 and 9 n + i denoting its 
updated value, the update rule for the estimation is defined as follows: 


9,1+ 1 = 0n + M e n , n = 0,1, 2, 3,... 


Equations (7.299) and (7.300) not only define the adaptive filtering algorithm for carrier- 
phase estimation, but also they provide the basis for implementing the algorithm, as shown 
in Figure 7.49. This figure may be viewed as a generalization of the well-known Costas loop 
for the analog synchronization of linear quadrature- amplitude modulation schemes that 


430 


Signaling over AWGN Channels 



First-order recursion filter 

The recursive Costas loop for estimating the carrier phase. 


involve the combined use of in-phase and quadrature components, of which the QPSK is a 
special example. As such, we may refer to the closed-loop synchronization scheme of Figure 
7.49 as the recursive Costas loop for phase synchronization. 

The following points should be noted in Figure 7.49: 

• The detector supplies an estimate of the symbol indicator a n and, therefore, the 
transmitted symbol, given the matched filter output. 

• For the input 0 n , the look-up table in the figure supplies the value of the exponential 

exp(-j#„) = cos@ n - j sin#,, 

• The output of the error generator is the error signal e n , defined in (7.299). 

• The block labeled z~ l represents a unit-time delay. 

The recursive Costas loop of Figure 7.49 uses a. first-order digital filter. To improve the 
tracking performance of this synchronization system, we may use a second-order digital 
filter. Figure 7.50 shows an example of a second-order recursive filter made up of a 
cascade of two first-order sections, with p as an adjustable loop parameter. An important 
property of a second-order recursive filter used in the Costas loop for phase recovery is 
that it will eventually lock onto the incoming carrier with no static error, provided that the 
frequency error between the receiver and transmitter is initially small. 


The adaptive behavior of the filtering schemes in Figures 7.48 and 7.49 for group-delay 
and carrier-phase estimation, respectively, is governed by how the step-size parameters 



Second-order recursive filter. 
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y and // are selected. The smaller we make y and, likewise, // , the more refined will be 
the trajectories resulting from application of the algorithms. However, this benefit is 
attained at the cost of the number of recursions required for convergence of the algorithms. 
On the other hand, if the step-size parameter y and // is assigned a large value, then the 
trajectories may follow a zig-zag sort of path. Indeed, if y and // exceeds a certain critical 
value of its own, it is quite possible for the algorithm to diverge, which means that the 
synchronization schemes of Figures 7.48 and 7.49 may become unstable. So, from a 
design perspective, the compromise choice between accuracy of estimation and speed of 
convergence may require a detailed attention, both theoretical and experimental. 


Summary and Discussion 


The primary goal of the material presented in this chapter is the formulation of a 
systematic procedure for the analysis and design of a digital communication receiver in 
the presence of AWGN. The procedure, known as maximum likelihood detection, decides 
which particular transmitted symbol is the most likely cause of the noisy signal observed 
at the channel output. The approach that led to the formulation of the maximum likelihood 
detector (receiver) is called signal-space analysis. The basic idea of the approach is to 
represent each member of a set of transmitted signals by an /V-dimensional vector, where 
N is the number of orthonormal basis functions needed for a unique geometric 
representation of the transmitted signals. The set of signal vectors so formed defines a 
signal constellation in an A'-dimensional signal space. 

For a given signal constellation, the (average) probability of symbol error, P e , incurred 
in maximum likelihood signal detection over an AWGN channel is invariant to rotation of 
the signal constellation as well as its translation. However, except for a few simple (but 
important) cases, the numerical calculation of P e is an impractical proposition. To 
overcome this difficulty, the customary practice is to resort to the use of bounds that lend 
themselves to computation in a straightforward manner. In this context, we described the 
union bound that follows directly from the signal-space diagram. The union bound is 
based on an intuitively satisfying idea: 


The results obtained using the union bound are usually fairly accurate, particularly when 
the SNR is high. 

With the basic background theory on optimum receivers covered in the early part of 
Chapter 7 at our disposal, formulas were derived for, or bounds on, the BER for some 
important digital modulation techniques in an AWGN channel: 

PSK, using coherent detection; it is represented by 

• binary PSK; 

• QPSK and its variants, namely, such as the offset QPSK; 

• coherent M - ary PSK, which includes binary PSK and QPSK as special cases with 
M =2 and M = 4, respectively. 

The DPSK may be viewed as the pseudo-noncoherent form of PSK. 
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M - ary QAM, using coherent detection; this modulation scheme is a hybrid form of 

modulation that combines amplitude and phase-shift keying. For M = 4, it includes 

QPSK as a special case. 

FSK, using coherent detection; it is represented by 

• binary FSK; 

• MSK and its Gaussian variant known as GMSK; 

• M - ary FSK. 

Noncoherent detection schemes, involving the use of binary FSK and DPSK. 

Irrespective of the digital modulation system of interest, synchronization of the receiver to 
the transmitter is essential to the operation of the system. Symbol timing recovery is 
required whether the receiver is coherent or not. If the receiver is coherent, we also require 
provision for carrier recovery. In the latter part of the chapter we discussed nondata-aided 
synchronizers to cater to these two requirements with emphasis on M - ary PSK, 
exemplified by QPSK signals, in which the carrier is suppressed. The presentation focused 
on recursive synchronization techniques that are naturally suited for the use of discrete- 
time signal processing algorithms. 

We conclude the discussion with some additional notes on the two adaptive filtering 
algorithms described in Section 7.16 on estimating the unknown parameters: carrier phase 
and group delay. In a computational context, these two algorithms are in the same class as 
the celebrated least- mean-square (LMS) algorithm described by Widrow and Hoff over 
50 years ago. The LMS algorithm is known for its computational efficiency, effectiveness 
in performance, and robustness with respect to the nonstationary character of the 
environment in which it is embedded. The two algorithmic phase and delay synchronizers 
share the first two properties of the LMS algorithm; for a conjecture, it may well be they 
are also robust when operating in a nonstationary communication environment. 


In Chapter 6 we described line codes for pulse-code modulation. Referring to the material presented 
therein, formulate the signal constellations for the following line codes: 
unipolar nonreturn-to-zero code 
polar nonretum-to-zero code 
unipolar return-to-zero code 
manchester code. 

An 8 -level PAM signal is defined by 


Figure P7.3 displays the waveforms of four signals i 2 (t )• ‘'■ 3 (f), and . 54 (f). 

Using the Gram-Schmidt orthogonalization procedure, find an orthonormal basis for this set of 
signals. 

Constmct the corresponding signal-space diagram. 
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g 

where A,- = ±1, ±3, ±5, ±7. Formulate the signal constellation of {s ; (f)}. _ 


Problems 


433 




CM 

s 3 (/) 


sjj) 

1 


1 


1 

— 


1 



t 


t 



— t 


0 T 0 2 T 0 T T 0 

3 3 3 


T 


t 


Using the Gram-Schmidt orthogonalization procedure, find a set of orthonormal basis functions 
to represent the three signals Si(t), s 2 (t), and s 2 (t) shown in Figure P7.4. 

Express each of these signals in terms of the set of basis functions found in part a. 



An orthogonal set of signals is characterized by the property that the inner product of any pair of 
signals in the set is zero. Figure P7.5 shows a pair of signals jj(f) and s 2 (t) that satisfy this definition. 
Construct the signal constellation for this pair of signals. 



s 2 (t) 



1 



0 

T 


M 

A source of information emits a set of symbols denoted by { m ■ } . _ Two candidate modulation 
schemes, namely pulse-duration modulation (PDM) and pulse-position modulation (PPM), are 
considered for the electrical representation of this set of symbols. In PDM, the ith symbol is 
represented by a pulse of unit amplitude and duration (i/M)T. On the other hand, in PPM, the ith 
symbol is represented by a short pulse of unit amplitude and fixed duration, which is transmitted at 
time t = (i/M)T. Show that PPM is the only one of the two that can produce an orthogonal set of 
signals over the interval 0<t<T. 

A set of 2 M biorthogonal signals is obtained from a set of M ordinary orthogonal signals by 
augmenting it with the negative of each signal in the set. 

The extension of orthogonal to biorthogonal signals leaves the dimensionality of the signal space 
unchanged. Explain how. 

Construct the signal constellation for the biorthogonal signals corresponding to the pair of 
orthogonal signals shown in Figure P7.5. 
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A pair of signals s;(t) and s k (t) have a common duration T. Show that the inner product of this 
pair of signals is given by 

T rj, 

f Sf(t)s k (t) dr = s ; s* 

J o 

where s ,• and are the vector representations of Sj(t) and s k (t), respectively. 

As a follow-up to part a of the problem, show that 

T 

| (Si(t)-s k (t)) 2 dt = (Sj-sJ" 

J o 

Consider a pair of complex- valued signals Sj(t) and s k (t) that are respectively represented by 


jj(r) = a n ^j(r) + flp^ 2 (0> -oo < r < oo 

s 2 (t) = a 2l <!>i(t) + a 22 <j> 2 (t), -00 < r < 00 

where the basis functions 0|(r) and <fh(t) are both real valued, but the coefficients a jj, Op, a-,p and 
a 2 T are complex valued. Prove the complex form of the Schwarz inequality: 

f /t( f M(0 dr <f |^(r)| 2 drj |j 2 (r)| 2 dr 

-X -x -x 

where the asterisk denotes complex conjugation. When is this relation satisfied with the equality sign? 

Stochastic Processes 

Consider a stochastic process X(t) expanded in the form 


x(r) = jr x^t) + W'{t), o < r < r 

i = 1 

N 

where W'(t) is a remainder noise term. The { ($ ( .(r) } _ form an orthonormal set over the interval 
0 < r < T, and the random variable X f is defined by 

X,. = f X(t)4(t) dr 
J o 

Let W'(t k ) denote a random variable obtained by observing W'(r) at time t = t k . Show that 


E [X:W\t k )} = 0, 


j = 1,2,..., N 
0<t k <T 


Consider the optimum detection of the sinusoidal signal in AWGN: 

s(t) = 0 <t<T 

Determine the correlator output assuming a noiseless input. 

Determine the corresponding matched filter output, assuming that the filter includes a delay T to 
make it causal. 

Hence, show that these two outputs are exactly the same only at the time instant r = T. 


Probability of Error 

Figure P7.12 shows a pair of signals iq(r) and s 2 (t) that are orthogonal to each other over the 
observation interval 0<t<3T. The received signal is defined by 
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x(t) = s k (t) + w(t) { °^ t ^ 3T 

l k= 1, 2 

where w(f) is white Gaussian noise of zero mean and power spectral density Nq/2. 

Design a receiver that decides in favor of signals Si(t) or s 2 (t), assuming that these two signals are 
equiprobable. 

Calculate the average probability of symbol error incurred by this receiver for EINq = 4, where E 
is the signal energy. 




In the Manchester code discussed in Chapter 6, binary symbol 1 is represented by the doublet pulse 
s(t) shown in Figure P7.13, and binary symbol 0 is represented by the negative of this pulse. Derive 
the formula for the probability of error incurred by the maximum likelihood detection procedure 
applied to this form of signaling over an AW GN channel. 



In the Bayes ' test, applied to a binary hypothesis-testing problem where we have to choose one of 
two possible hypotheses H 0 and Hy, we minimize the risk 3ft defined by 

3ft = C 00 p 0 ( say H 0 \H 0 is true) + C 10 p 0 (say H 1 1 H 0 is true) + CjjPjCsay H j |//j is true) + C 01 P[(say H 0 ^H 1 is true) 

The parameters C 0 o, Ci 0 , Cn, and Cot denote the costs assigned to the four possible outcomes of the 
experiment: the first subscript indicates the hypothesis chosen and the second the hypothesis that is 
true. Assume that Qo > Cqq and Cqi > Cn- The p 0 and p\ denote the a priori probabilities of 
hypotheses H Q and H^, respectively. 

Given the observation vector x, show that the partitioning of the observation space so as to 
minimize the risk 3ft leads to the likelihood ratio test : 


say H 0 if A(x) < A 
say if A(x) > X 


where A(x) is the likelihood ratio defined by 


A(x) 


/x(* l^l) 

/x(x l»o) 
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and A is the threshold of the test defined by 

^ _ Pq(Ciq- Cqo) 

Pi(C 0 i - C n ) 

What are the cost values for which the Bayes’ criterion reduces to the minimum probability of 
error criterion? 

Principles of Rotational and Translational Invariance 

Continuing with the four line codes considered in Problem 7.1, identify the line codes that have 
minimum average energy and those that do not. Compare your answers with the observations made 
on these line codes in Chapter 6. 

Consider the two constellations shown in Figure 7.10. Determine the orthonormal matrix Q that 
transforms the constellation shown in Figure 7.10a into the one shown in Figure 7.10b. 

The two signal constellations shown in Figure P7.17 exhibit the same average probability of 
symbol error. Justify the validity of this statement. 

Which of these two constellations has minimum average energy? Justify your answer. 

You may assume that the symbols pertaining to the message points displayed in Figure P7.17 are 
equally likely. 



$2 

V5a 


-V5c 
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(a) 


(b) 


Simplex (transorthogonal) signals are equally likely highly-correlated signals with the most negative 
correlation that can be achieved with a set of M orthogonal signals. That is, the correlation 
coefficient between any pair of signals in the set is defined by 

_ | 1 for i =j 

11 1 -1 /(M- 1) for i±j 

One method of constructing simplex signals is to start with a set of M orthogonal signals each with 
energy E and then apply the minimum energy translate. 

Consider a set of three equally likely symbols whose signal constellation consists of the vertices of 
an equilateral triangle. Show that these three symbols constitute a simplex code. 

Amplitude-Shift Keying 

In the on-off keying version of an ASK system, symbol 1 is represented by transmitting a sinusoidal 
carrier of amplitude j2E h /T h , where E b is the signal energy per bit and T ^ is the bit duration. 
Symbol 0 is represented by switching off the carrier. Assume that symbols 1 and 0 occur with equal 
probability. 
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For an AWGN channel, determine the average probability of error for this ASK system under the 
following scenarios: 

Coherent detection. 


Noncoherent detection, operating with a large value of bit energy-to-noise spectral density ratio 
E b /N 0 . 

Note: when x is large, the modified Bessel function of the first kind of zero order may be 
approximated as follows (see AppendixC): 


V-f) * 


exp(.r) 

J2nx 


Phase-Shift Keying 

The PSK signal is applied to a correlator supplied with a phase reference that lies within <p radians 
of the exact carrier phase. Determine the effect of the phase error (p on the average probability of 
error of the system. 

The signal component of a PSK system scheme using coherent detection is defined by 
s(t) = A c k sin(2jt/ c t) ± A c Ji - k~ cos(27t/ c f) 


where 0 < t < T b , the plus sign corresponds to symbol 1, and the minus sign corresponds to symbol 
0; the parameter k lies in the range 0 < k < 1. The first term of s(t) represents a carrier component 
included for the purpose of synchronizing the receiver to the transmitter. 

Draw a signal-space diagram for the scheme described here. What observations can you make 
about this diagram? 

Show that, in the presence of AWGN of zero mean and power spectral density N 0 /2, the average 
probability of error is 


where 


12 E. , ' 
p e = 2 ./*?(!-*) 


E h = \ A l T b 


Suppose that 10% of the transmitted signal power is allocated to the carrier component. 
Determine the E^/Nq required to realize P e = 1CT 4 . 

Compare this value of E^/N 0 with that required for a binary PSK scheme using coherent 
detection, with the same probability of error. 

Given the input binary sequence 1100100010, sketch the waveforms of the in-phase and 
quadrature components of a modulated wave obtained using the QPSK based on the signal set of 
Figure 7.16. 

Sketch the QPSK waveform itself for the input binary sequence specified in part a. 


Let P eI and P e q denote the probabilities of symbol error for the in-phase and quadrature channels, 
respectively, of a narrowband digital communication system. Show that the average probability of 
symbol error for the overall system is given by 


P e - P eI + / j c q - 


^el^eQ 


Equation (7.132) is an approximate formula for the average probability of symbol error for M - ary 
PSK using coherent detection. This formula was derived using the union bound in light of the signal- 
space diagram of Figure 7.22b. Given that message point m.\ was transmitted, show that the 
approximation of (7.132) may be derived directly from Figure 7.22b. 
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Find the power spectral density of an offset QPSK signal produced by a random binary sequence in 
which symbols 1 and 0 (represented by ±1) are equally likely and the symbols in different time slots 
are statistically independent and identically distributed. 

Vestigial sideband modulation (VSB), discussed in Chapter 2, offers another possible modulation 
method for signaling over an AWGN channel. 

In particular, a digital VSB transmission system may be viewed as a time-varying one- 
dimensional system operating at a rate of 2 IT dimensions per second, where T is the symbol 
period. Justify the validity of this statement. 

Show that digital VSB is indeed equivalent in performance to the offset QPSK. 

Quadrature Amplitude Modulation 

Referring back to Example 7, develop a systematic procedure for constructing M - ary QAM 
constellations given the M - ary QAM constellation of Figure 7.24 for M = 16. In effect, this problem 
addresses the opposite approach to that described in Example 7. 

Figure P7.28 describes the block diagram of a generalized M-ary QAM modulator. Basically, the 
modulator includes a mapper that produces a complex amplitude a m input for m = 0, 1, ..., M— 1, 
The real and imaginary parts of a m input the basis functions <j>\ (t) and , respectively. The 
modulator is generalized in that it embodies M-ary PSK and M-ary PAM as special cases. 

Formulate the underlying mathematics of the modulator described in Figure P7.28. 

Hence, show that M-ary PSK and M-ary PAM are indeed special cases of the M-ary QPSK 
generated by the block diagram of Figure P7.28. 



Frequency-Shift Keying 

The signal vectors S| and St are used to represent binary symbols 1 and 0, respectively, in a binary 

FSK system using coherent detection. The receiver decides in favor of symbol 1 when 

T T 

X S[ > X St 

where x T s,- is the inner product of the observation vector x and the signal vector s ; , i= 1,2. Show that 
this decision rule is equivalent to the condition xq > x 2 , where xq and x 2 are the two elements of the 
observation vector x. Assume that the signal vectors Sj and s 2 have equal energy. 

An FSK system transmits binary data at the rate of 2.5 x 10 6 bits/s. During the course of 
transmission, white Gaussian noise of zero mean and power spectral density 10 -20 W/Hz is added to 
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the signal. In the absence of noise, the amplitude of the received sinusoidal wave for digit 1 or 0 is 
1 mV. Determine the average probability of symbol error for the following system configurations: 
binary FSK using coherent detection; 

MSK using coherent detection; 
binary FSK using noncoherent detection. 

In an FSK system using coherent detection, the signals i-jlf) and i- 9 (r) representing binary symbols 1 
and 0, respectively, are defined by 

Sl(t),s 2 (t) = A c cos^2n(f c ±^yj, 0 <t<T h 


Assuming that f c > A/, show that the correlation coefficient of the signals ^[(1) and j 2 (0 is 
approximately given by 

r r b 

I s l (t)s 2 (t) d t 

P = ~ sinc(2A/.T b ) 

C b 2 

f 5 ,(1) dr 

J o 

What is the minimum value of frequency shift A f for which the signals Si(t) and s 2 (t) are 
orthogonal? 

What is the value of A/ that minimizes the average probability of symbol error? 

For the value of A/ obtained in part c, determine the increase in E b /N 0 required so that this FSK 
scheme has the same noise performance as a binary PSK scheme system, also using coherent 
detection. 


A binary FSK signal with discontinuous phase is defined by 



for symbol 1 
for symbol 0 


where is the signal energy per bit, is the bit duration, and 9\ and 0 2 are sample values of 
uniformly distributed random variables over the interval 0 to 2 ji. In effect, the two oscillators 
supplying the transmitted frequencies f c ± A/72 operate independently of each other. Assume that 
/c»A/. 

Evaluate the power spectral density of the FSK signal. 

Show that, for frequencies far removed from the carrier frequency/,., the power spectral density 
falls off as the inverse square of frequency. How does this result compare with a binary FSK 
signal with continuous phase? 


Set up a block diagram for the generation of Sunde’s FSK signal s(t) with continuous phase by using 
the representation given in (7.170), which is reproduced here 



Discuss the similarities between MSK and offset QPSK, and the features that distinguish them. 

There are two ways of detecting an MSK signal. One way is to use a coherent receiver to take full 
advantage of the phase information content of the MSK signal. Another way is to use a noncoherent 
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receiver and disregard the phase information. The second method offers the advantage of simplicity 
of implementation at the expense of a degraded noise performance. By how many decibels do we 
have to increase the bit energy-to-noise density ratio E b /N 0 in the second method so as to realize the 
same average probability of symbol error equal to 10 -5 ? 


Sketch the waveforms of the in-phase and quadrature components of the MSK signal in response 
to the input binary sequence 1 100100010. 

Sketch the MSK waveform itself for the binary sequence specified in part a. 


An NRZ data stream of amplitude levels ±1 is passed through a low-pass filter whose impulse 
response is defined by the Gaussian function 


h(t ) = —exp 
a 



where a is a design parameter defined in terms of the filter's 3dB bandwidth by 


a = 



Show that the transfer function of the filter is defined by 

H(f) = exp (-a 2 /) 

Hence, demonstrate that the 3dB bandwidth of the filter is indeed equal to W. You may use the 
list of Fourier-transform pairs in Table 2.1. 

Determine the response of the filter to a rectangular pulse of unit amplitude and duration T 
centered on the origin. 

Summarize the similarities and differences between the standard MSK and Gaussian filtered MSK 
signals. 

Summarize the basic similarities and differences between the standard MSK and QPSK. 


Noncoherent Receivers 

In Section 7. 12 we derived the formula for the BER of binary FSK using noncoherent detection as a 
special case of noncoherent orthogonal modulation. In this problem we revisit this issue. As before, 
we assume that symbol 1 is represented by signal iq(t) and symbol 0 is represented by signal Sjit). 
According to the material presented in Section 7.12, we note the following: 

The random variable L 2 represented by the sample value l 2 is Rayleigh distributed. 

The random variable Lj represented by the sample value Zj is Rician distributed. 

The Rayleigh and Rician distributions were discussed in Chapter 4. Using the probability 
distributions defined in that chapter, derive (7.245) for the BER of binary FSK, using noncoherent 
detection. 

Figure P7.41a shows a noncoherent receiver using a matched filter for the detection of a sinusoidal 
signal of known frequency but random phase and under the assumption of AWGN. An alternative 
implementation of this receiver is its mechanization in the frequency domain as a spectrum analyzer 
receiver, as in Figure P7.41b, where the correlator computes the finite-time autocorrelation function 
defined by 

RJ t) = f x(t)x(t + r), 0 <t<T 

J o 

Show that the square-law envelope detector output sampled at time t = T in Figure P7.41a is twice 
the spectral output of the Fourier transform sampled at frequency / = f c in Figure P7.41b. 
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The binary sequence 1 100100010 is applied to the DPSK transmitter of Figure 7.44. 

Sketch the resulting waveform at the transmitter output. 

Applying this waveform to the DPSK receiver of Figure 7.46, show that in the absence of noise 
the original binary sequence is reconstructed at the receiver output. 

Comparison of Digital Modulation Schemes Using a Single Carrier 

Binary data are transmitted over a microwave link at the rate of 10 6 bits/s and the power spectral 
density of the noise at the receiver input is 10“ 1CI W/Hz. Find the average carrier power required to 
maintain an average probability of error P e < 10 -4 for the following schemes: 

Binary PSK using coherent detection; 

DPSK. 

The values of E b /N 0 required to realize an average probability of symbol error P e = KT 4 for binary 
PSK and binary FSK schemes are equal to 7.2 and 13.5, respectively. Using the approximation 

Q(u ) « exp(-2» 2 ) 

J2nu 

determine the separation in the values of E b /N 0 for P t = 10 -4 , using: 
binary PSK using coherent detection and DPSK; 
binary PSK and QPSK, both using coherent detection; 
binary FSK using (i) coherent detection and (ii) noncoherent detection; 
binary FSK and MSK, both using coherent detection. 

In Section 7.14 we compared the noise performances of various digital modulation schemes under 
the two classes of coherent and noncoherent detection; therein, we used the BER as the basis of 
comparison. In this problem we take a different viewpoint and use the average probability of symbol 
error P e , to do the comparison. Plot P e versus E b /N 0 for each of these schemes and comment on 
your results. 


Synchronization 

Demonstrate the equivalence of the two complex representations given in (7.276) and (7.277), which 
pertain to the likelihood function. 

In the recursive algorithm of (7.295) for symbol timing recovery, the control signals c n and c n + [ 
are both dimensionless. Discuss the units in which the error signal e n and step-size parameter /u 
are measured. 

In the recursive algorithm of (7.300) for phase recovery, the old estimate 0 n and the updated 
estimate 0 n + j of the carrier phase 6>are both measured in radians. Discuss the units in which the 
error signal e n and step-size parameter /u are measured. 

The binary PSK is a special case of QPSK. Using the adaptive filtering algorithms derived in Section 
7.16 for estimating the group delay r and carrier phase 0, find the corresponding adaptive filtering 
algorithms for binary PSK. 

Repeat Problem 7.48, but this time find the adaptive filtering algorithms for M - ary PSK. 
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Suppose we transmit a sequence of L 0 statistically independent symbols of a QPSK signal, as shown 
by 


where L 0 is not to be confused with the symbol for average log-likelihood L av . The channel output is 
corrupted by AWGN of zero mean and power spectral density N 0 / 2, carrier phase 0, and unknown 
group delay r . 

Determine the likelihood function with respect to the group delay r, assuming that 6 is uni- 
formly distributed. 

Hence, formulate the maximum likelihood estimate of the group delay r. 

Compare this feedforward scheme of group-delay estimation with that provided by the NDA- 
ELD synchronizer of Figure 7.48. 

Repeat Problem 7.50, but this time do the following: 

Determine the likelihood function with respect to the carrier phase 0 , assuming that the group 
delay r is known. 

Hence, formulate the maximum likelihood estimate of the carrier phase 0 . 

Compare this feedforward scheme of a carrier-phase estimation with the recursive Costas loop of 
Figure 7.49. 

In Section 7.16 we studied a nondata-aided scheme for carrier phase recovery, based on the log- 
likelihood function of (7.296). In this problem we explore the use of this equation for data-auled 
carrier phase recovery. 

Consider a receiver designed for a linear modulation system. Given that the receiver has 
knowledge of a preamble of length L 0 , show that the maximum likelihood estimate of the carrier 
phase is defined by 


L 0 -l 

z • 

n = 0 

^ Dq 1 ^ Lq 1 

where the preamble {«„} 0 is a known sequence of complex symbols and {x n } Q is the 

complex envelope of the corresponding received signal. 

Using the result derived in part a, construct a block diagram for the maximum likelihood phase 
estimator. 

Figure P7.53 shows the block diagram of a phase-synchronization system. Determine the phase 
estimate 0 of the unknown carrier phase in the received signal x (t) . 
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Computer Experiments 

In this computer-oriented problem, we study the operation of the NDA-ELD synchronizer for 
symbol timing recovery by considering a coherent QPSK system with the following specifications: 
The channel response is described by a raised cosine pulse with rolloff factor a = 0.5. 

The recursive filter is a first-order digital filter with transfer function 

H(z) = ^ 

1 - ( 1 - y A)z 

where xT 1 denotes unit delay, y is the step-size parameter, and A is a parameter, to be defined. 
The loop bandwidth is 2% of the symbol rate l/T, that is, B]T = 0.02. 

With symbol timing recovery as the objective, a logical way to proceed is to plot the S-curve for the 
NDA-ELD under the following conditions: 

E b /N 0 = 10 dB 

E b /N 0 = co (i.e., noiseless channel). 

For NDA-ELD, the scheme shown in Figure P7.54 is responsible for generating the S-curve that 
plots the timing offset versus the discrete time n = t!T. 

Using this scheme, plot the S-curves, and comment on the results obtained for parts a and b. 



In this follow-up to the computer-oriented Problem 7.54, we study the recursive Costas loop for 
phase recovery using the same system specifications described in Problem 7.54. This time, however, 
we use the scheme of Figure P7.54 for measuring the S-curve to plot the phase error versus discrete- 
time n = t/T. 


exp (-/'©) 



sm 
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The plot is to be carried out under the following conditions: 
E b /N 0 = 5 dB 
E b /N 0 = 10 dB 

E b /N 0 = 30 dB (i.e., practically noiseless channel) 
Comment on the results obtained for these three conditions. 

Notes 


1. The geometric representation of signals was first developed by Kotel'nikov (1947) which is a 
translation of the original doctoral dissertation presented in January 1947 before the Academic 
Council of the Molotov Energy Institute in Moscow. In particular, see Part II of the book. This method 
was subsequently brought to fuller fruition in the classic book by Wozencraft and Jacobs (1965). 

2. The classic reference for the union bound is Wozencraft and Jacobs (1965). 

3. Appendix C addresses the derivation of simple bounds on the ^-function. In (7.88), we have used 
the following bound: 


which becomes increasingly tight for large positive values of x. 

4. For an early paper on the offset QPSK, see Gitlin and Ho (1975). 

5. The MSK signal was first described in Doelz and Heald (1961). For a tutorial review of MSK and 
comparison with QPSK, see Pasupathy (1979). Since the frequency spacing is only half as much as 
the conventional spacing of 1/T b that is used in the coherent detection of binary FSK signals, this 
signaling scheme is also referred to as fast FSK; see deBuda (1972), who was not aware of the 
Doelz-Heald patent. 

6. For early discussions of GMSK, see Murota and Hirade (1981) and Ishizuke and Hirade (1980). 

7. The analytical specification of the power spectral density of digital FM is difficult to handle, 
except for the case of a rectangular shaped modulating pulse. The paper by Garrison (1975) presents 
a procedure based on the selection of an appropriate duration-limited/level-quantized approximation 
for the modulating pulse. The equations developed therein are particularly suitable for machine 
computation of the power spectra of digital FM signals; see the book by Stiiber (1996). 

8. A detailed analysis of the spectra of M-ary FSK for an arbitrary value of frequency deviation is 
presented in the paper by Anderson and Salz (1965). 

9. Readers who are not interested in the formal derivation of (7.227) may at this point wish to move 
on to the treatment of noncoherent binary FSK (in Section 7.12) and DPSK (in Section 7.13), two 
special cases of noncoherent orthogonal modulation, without loss of continuity. 

10. The standard method of deriving the BER for noncoherent binary FSK, presented in 
McDonough and Whalen (1995) and that for DPSK presented in Arthurs and Dym (1962), involves 
the use of the Rician distribution. This distribution arises when the envelope of a sine wave plus 
additive Gaussian noise is of interest; see Chapter 4 for a discussion of the Rician distribution. The 
derivations presented herein avoid the complications encountered in the standard method. 

11. The optimum receiver for differential phase-shift keying is discussed in Simon and Divsalar 
(1992). 

12. For detailed treatment of the algorithmic approach for solving the synchronization problem in 
signaling over AWGN channels, the reader is referred to the books by Mengali and D’ Andrea (1997) 
and Meyer et al. (1998). For books on the traditional approach to synchronization, the reader is 
referred to Lindsey and Simon (1973). 



Signaling over Band-Limited 
Channels 


Introduction 


In Chapter 7 we focused attention on signaling over a channel that is assumed to be 
distortionless except for the AWGN at the channel output. In other words, there was no 
limitation imposed on the channel bandwidth, with the energy per bit to noise spectral 
density ratio E^/Nq being the only factor to affect the performance of the receiver. In 
reality, however, every physical channel is not only noisy, but also limited to some finite 
bandwidth. Hence the title of this chapter: signaling over band-limited channels. 

The important point to note here is that if, for example, a rectangular pulse, represent- 
ing one bit of information, is applied to the channel input, the shape of the pulse will be 
distorted at the channel output. Typically, the distorted pulse may consist of a main lobe 
representing the original bit of information surrounded by a long sequence of sidelobes on 
each side of the main lobe. The sidelobes represent a new source of channel distortion, 
referred to as intersymbol interference, so called because of its degrading influence on the 
adjacent bits of information. 

There is a fundamental difference between intersymbol interference and channel noise 
that could be summarized as follows: 

• Channel noise is independent of the transmitted signal; its effect on data 
transmission over the band-limited channel shows up at the receiver input, once the 
data transmission system is switched on. 

• Intersymbol interference, on the other hand, is signal dependent, it disappears only 
when the transmitted signal is switched off. 

In Chapter 7, channel noise was considered all by itself so as to develop a basic 
understanding of how its presence affects receiver performance. It is logical, therefore, 
that in the sequel to that chapter, we initially focus on intersymbol interference acting 
alone. In practical terms, we may justify a noise-free condition by assuming that the SNR 
is high enough to ignore the effect of channel noise. The study of signaling over a band- 
limited channel, under the condition that the channel is effectively “noiseless,” occupies 
the first part of the chapter. The objective here is that of signal design, whereby the effect 
of symbol interference is reduced to zero. 

The second part of the chapter focuses on a noisy wideband channel. In this case, data 
transmission over the channel is tackled by dividing it into a number of subchannels, with 
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each subchannel being narrowband enough to permit the application of Shannon’s 
information capacity law that was considered in Chapter 5. The objective here is that of 
system design, whereby the rate of data transmission through the system is maximized to 
the highest level physically possible. 

Error Rate Due to Channel Noise in a Matched-Filter Receiver 


We begin the study of signaling over band-limited channels by determining the operating 
conditions that would permit us to view the channel to be effectively “noiseless.” To this 
end, consider the block diagram of Figure 8.1, which depicts the following data- 
transmission scenario: a binary data stream is applied to a noisy channel where the 
additive channel noise w(t) is modeled as white and Gaussian with zero mean and power 
spectral density Nq/2. The data stream is based on polar NRZ signaling, in which symbols 
1 and 0 are represented by positive and negative rectangular pulses of amplitude A and 
duration 7j,. In the signaling interval 0 < f < T b , the received signal is defined by 

_ ) +A + w(t), symbol 1 was sent 

[ -A + w(t), symbol 0 was sent 

The receiver operates synchronously with the transmitter, which means that the matched 
fdter at the front end of the receiver has knowledge of the starting and ending times of 
each transmitted pulse. The matched filter is followed by a sampler, and then finally a 
decision device. To simplify matters, it is assumed that the symbols 1 and 0 are equally 
likely; the threshold in the decision device, namely A , may then be set equal to zero. If 
this threshold is exceeded, the receiver decides in favor of symbol 1 ; if not, it decides in 
favor of symbol 0. A random choice is made in the case of a tie. 

Following the geometric signal-space theory presented in Section 7.6 on binary PSK, 
the transmitted signal constellation consists of a pair of message points located at + jE~ b 
and -jE~ b ■ The energy per bit is defined by 

E b = A 2 T b 

The only basis function of the signal-space diagram is a rectangular pulse defined as follows: 


</>(t) = 


0 , 


for 0 < t < T b 
otherwise 



Say 1 if y >A 
Say 0 if y <A 


Receiver for baseband transmission of binary-encoded data stream using polar 
NRZ signaling. 
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Probability of error in the 
signaling scheme of Figure 8.1. 



In mathematical terms, the form of signaling embodied in Figure 8.1 is equivalent to that 
of binary PSK. Following (7.109), the average probability of symbol error incurred by the 
matched-filter receiver in Figure 8.1 is therefore defined by the (9-function 

f 

P = Q 

e 

V 

Although this result for NRZ-signaling over an AWGN channel may seem to be special, 
(8.3) holds for a binary data transmission system where symbol 1 is represented by a 
generic pulse g(t) and symbol 0 is represented by -g(t) under the assumption that the 
energy contained in g(t) is equal to E b . This statement follows from matched-filter theory 
presented in Chapter 7. 

Figure 8.2 plots P e versus the dimensionless SNR, E^/Nq. The important message to 
take from this figure is summed up as follows: 



For example, expressing E^/Nq in decibels we see from Figure 8.2 that P e is on the order 
of 1 0 6 when E^/Nq = 10 dB. Such a value of P e is small enough to say that the effect of 
the channel noise is ignorable. 

Henceforth, in the first part of the chapter dealing with signaling over band-limited 
channels, we assume that the SNR, E^/Nq, is large enough to leave intersymbol 
interference as the only source of interference. 

Intersymbol Interference 


To proceed with a mathematical study of intersymbol interference, consider a baseband 
binary PAM system, a generic form of which is depicted in Figure 8.3. The term 
“baseband” refers to an information-bearing signal whose spectrum extends from (or near) 
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Clock Gaussian Threshold A 

P ulses noise w(r) 


Say 1 
Say 0 


if y(r,) >A 
if y(< f ) <A 


Transmitter *-h Channel <4* Receiver 

Baseband binary data transmission system. 


zero up to some finite value for positive frequencies. Thus, with the input data stream 
being a baseband signal, the data-transmission system of Figure 8.3 is said to be a 
baseband system. Consequently, unlike the subject matter studied in Chapter 7, there is no 
carrier modulation in the transmitter and, therefore, no carrier demodulation in the 
receiver to be considered. 

Next, addressing the choice of discrete PAM, we say that this form of pulse modulation 
is one of the most efficient schemes for data transmission over a baseband channel when 
the utilization of both transmit power and channel bandwidth is of particular concern. In 
this section, we consider the simple case of binary PAM. 

Referring back to Figure 8.3, the pulse-amplitude modulator changes the input binary 
data stream {b k } into a new sequence of short pulses, short enough to approximate 
impulses. More specifically, the pulse amplitude a k is represented in the polar form: 


a k 


+1 if b k is symbol 1 
-1 if b k is symbol 0 


The sequence of short pulses so produced is applied to a transmit filter whose impulse 
response is denoted by g(t). The transmitted signal is thus defined by the sequence 


40 = Yj a ^ l ~ kT b ) 

k 

Equation (8.5) is a form of linear modulation , which may be stated in words as follows: 


The signal s(t) is naturally modified as a result of transmission through the channel whose 
impulse response is denoted by h(t). The noisy received signal x(t) is passed through a 
receive filter of impulse response c(t). The resulting filter output y(t) is sampled 
synchronously with the transmitter, with the sampling instants being determined by a clock 
or timing signal that is usually extracted from the receive-filter output. Finally, the 
sequence of samples thus obtained is used to reconstruct the original data sequence by 
means of a decision device. Specifically, the amplitude of each sample is compared with a 
zero threshold, assuming that the symbols 1 and 0 are equiprobable. If the zero threshold is 
exceeded, a decision is made in favor of symbol 1 ; otherwise a decision is made in favor of 
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symbol 0. If the sample amplitude equals the zero threshold exactly, the receiver simply 
makes a random guess. 

Except for a trivial scaling factor, we may now express the receive filter output as 

y(t) = Y a k p ^ -kT h ) 

k 

where the pulse p(t ) is to be defined. To be precise, an arbitrary time delay f 0 should be 
included in the argument of the pulse p(t - kT b ) in (8.6) to represent the effect of 
transmission delay through the system. To simplify the exposition, we have put this delay 
equal to zero in (8.6) without loss of generality; moreover, the channel noise is ignored. 

The scaled pulse p(t) is obtained by a double convolution involving the impulse 
response g(t) of the transmit filter, the impulse response h( t) of the channel, and the 
impulse response c(t ) of the receive filter, as shown by 

P(t) = g(t)+h(t)+c(t) 

where, as usual, the star denotes convolution. We assume that the pulse p(t) is normalized 
by setting 

P( 0) = 1 

which justifies the use of a scaling factor to account for amplitude changes incurred in the 
course of signal transmission through the system. 

Since convolution in the time domain is transformed into multiplication in the 
frequency domain, we may use the Fourier transform to change (8.7) into the equivalent 
form 

P(f) = G(f)H(f)C(f) 

where P(f), G(f), H(f), and C(f ) are the Fourier transforms of p(t), g(t), hit ), and c(t), 
respectively. 

The receive filter output y(t) is sampled at time f, = iT b , where i takes on integer values; 
hence, we may use (8.6) to write 

00 

y(t l ) = Yj a kPt ( i ~ k '> T bl 

k = -oo 

00 

= «,-+ y a kP^ i ~ k ) T b\ 

k = -oo 
k ^ i 

In (8.10), the first term a, represents the contribution of the ith transmitted bit. The second 
term represents the residual effect of all other transmitted bits on the decoding of the ith 
bit. This residual effect due to the occurrence of pulses before and after the sampling 
instant f ( - is called intersymbol interference (ISI). 

In the absence of ISI — and, of course, channel noise — we observe from (8.10) that the 
summation term is zero, thereby reducing the equation to 

y(r.) = a- 

which shows that, under these ideal conditions, the ith transmitted bit is decoded correctly. 
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Signal Design for Zero ISI 


The primary objective of this chapter is to formulate an overall pulse shape p(t) so as to 
mitigate the ISI problem, given the impulse response of the channel h{t). With this 
objective in mind, we may now state the problem at hand: 


In effect, signaling over the band-limited channel becomes distortionless; hence, we may 
refer to the pulse-shaping requirement as a signal-design problem. 

In the next section we describe a signal-design procedure, whereby overlapping pulses 
in the binary data-transmission system of Figure 8.3 are configured in such a way that at 
the receiver output they do not interfere with each other at the sampling times f,- = iT j,. So 
long as the reconstruction of the original binary data stream is accomplished, the behavior 
of the overlapping pulses outside these sampling times is clearly of no practical 
consequence. Such a design procedure is rooted in the criterion for distortionless 
transmission, which was formulated by Nyquist (1928b) on telegraph transmission theory, 
a theory that is as valid then as it is today. 

Referring to (8.10), we see that the weighted pulse contribution, a k /)(z7 h - kTf), must 
be zero for all k except for k = 1 for binary data transmission across the band-limited 
channel to be ISI free. In other words, the overall pulse-shape p(t) must be designed to 
satisfy the requirement 


where p( 0) is set equal to unity in accordance with the normalization condition of (8.8). A 
pulse p{t) that satisfies the two-part condition of (8.11) is called a Nyquist pulse, and the 
condition itself is referred to as Nyquist' s criterion for distortionless binary baseband data 
transmission. However, there is no unique Nyquist pulse; rather, there are many pulse 
shapes that satisfy the Nyquist criterion of (8.11). In the next section we describe two 
kinds of Nyquist pulses, each with its own attributes. 

Ideal Nyquist Pulse for Distortionless Baseband 
Data Transmission 


From a design point of view, it is informative to transform the two-part condition of (8.11) 
into the frequency domain. Consider then the sequence of samples {p(nTf)}, where n = 0, 
±1, ±2, .... From the discussion presented in Chapter 6 on the sampling process, we recall 
that sampling in the time domain produces periodicity in the frequency domain. In 
particular, we may write 



00 


P d if) = R b X p(f ~ nR b) 


Ideal Nyquist Pulse for Distortionless Baseband Data Transmission 
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where R b = 1 IT b is the bit rate in bits per second; Pg{f) on the left-hand side of (8.12) is the 
Fourier transform of an infinite periodic sequence of delta functions of period T b whose 
individual areas are weighted by the respective sample values of p(t). That is, Pgif) is 
given by 


.00 00 

P s (f) = j Yj [F(wT b )^(t-OTr fc )]e x p(-j27t/0 dt 

-co m = —oo 

Let the integer m = i - k. Then, i = k corresponds to m = 0 and, likewise, i ^ k 
corresponds to m # 0 . Accordingly, imposing the conditions of (8.11) on the sample 
values of p(t) in the integral in (8.13), we get 


if) = F(0){ 


GO 

J(t)exp(-j2jt/r) d t 

—CO 


= Pi 0) 

where we have made use of the sifting property of the delta function. Since from (8.8) we 
have p( 0) = 1, it follows from (8.12) and (8.14) that the frequency-domain condition for 
zero IS1 is satisfied, provided that 


X P( f- ,lR b) = 7 b 

n = -oo 

where T b = 1 /R b . We may now make the following statement on the Nyquist criterion for 
distortionless baseband transmission in the frequency domain: 


Note that P(f) refers to the overall system, incorporating the transmit filter, the channel, 
and the receive filter in accordance with (8.9). 


The simplest way of satisfying (8.15) is to specify the frequency function P(f) to be in the 
form of a rectangular function, as shown by 


Pif) 


h- - w<f<w 

0, I/I > w 


2 W 


,rect 



where rect(/) stands for a rectangular function of unit amplitude and unit support centered 
on/= 0 and the overall baseband system bandwidth W is defined by 
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According to the solution in (8.16), no frequencies of absolute value exceeding half the bit 
rate are needed. Hence, from Fourier-transform pair 1 of Table 2.2 in Chapter 2, we find 
that a signal waveform that produces zero ISI is defined by the sine function: 

P (t) = sin ( 2nWf ) 

2nWt 

= sinc(2Wf) 

The special value of the bit rate R b = 2 W is called the Nyquist rate and W is itself called the 
Nyquist bandwidth. Correspondingly, the baseband pulse pi t) for distortionless 
transmission described in (8 . 1 8) is called the ideal Nyquist pulse, ideal in the sense that the 
bandwidth requirement is one half the bit rate. 

Figure 8.4 shows plots of P(f) and pit). In part a of the figure, the normalized form of 
the frequency function P(f) is plotted for positive and negative frequencies. In part b of 
the figure, we have also included the signaling intervals and the corresponding centered 
sampling instants. The function p(t) can be regarded as the impulse response of an ideal 
low-pass filter with passband magnitude response 1/2 W and bandwidth W. The function 
p( t) has its peak value at the origin and goes through zero at integer multiples of the bit 
duration T b . It is apparent, therefore, that if the received waveform y(t) is sampled at the 
instants of time t = 0, ±7 h , ±27 b , . . ., then the pulses defined by a t p( t - iT b ) with amplitude 
a i and index i = 0, ±1, ±2, . . . will not interfere with each other. This condition is illustrated 
in Figure 8.5 for the binary sequence 1011010. 
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2 WP(f) 
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Signaling intervals 


(a) 

(a) Ideal magnitude response, (b) Ideal basic pulse shape. 


(b) 
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Binary sequence 10 110 10 


1.0 



0.5 


CD 

TD 


E 

< 


- 0.5 
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A series of sine pulses corresponding to the sequence 1011010. 


Although the use of the ideal Nyquist pulse does indeed achieve economy in 
bandwidth, in that it solves the problem of zero ISI with the minimum bandwidth possible, 
there are two practical difficulties that make it an undesirable objective for signal design: 

It requires that the magnitude characteristic of P(f) be flat from -W to +W, and zero 
elsewhere. This is physically unrealizable because of the abrupt transitions at the 
band edges ±W, in that the Paley-Wiener criterion discussed in Chapter 2 is violated. 
The pulse function p(t) decreases as 1/| t\ for large 1 1 |, resulting in a slow rate of 
decay. This is also caused by the discontinuity of P(f) at ±W. Accordingly, there is 
practically no margin of error in sampling times in the receiver. 

To evaluate the effect of the timing error alluded to under point 2, consider the sample of 
y(t) at t = At, where At is the timing error. To simplify the exposition, we may put the 
correct sampling time t t equal to zero. In the absence of noise, we thus have from the first 
line of (8. 10): 


OO 


y(At) = ^ a k p(At-kT b ) 


k = -oo 



Since 2 WT b = 1, by definition, we may reduce (8.19) to 



k 


k = —oo 

k* 0 


454 


Signaling over Band-Limited Channels 


The first term on the right-hand side of (8.20) defines the desired symbol, whereas the 
remaining series represents the ISI caused by the timing error At in sampling the receiver 
output y(t). Unfortunately, it is possible for this series to diverge, thereby causing the 
receiver to make erroneous decisions that are undesirable. 

Raised-Cosine Spectrum 


We may overcome the practical difficulties encountered with the ideal Nyquist pulse by 
extending the bandwidth from the minimum value W = RyJ2 to an adjustable value 
between W and 2 W. In effect, we are trading off increased channel bandwidth for a more 
robust signal design that is tolerant of timing errors. Specifically, the overall frequency 
response P(f) is designed to satisfy a condition more stringent than that for the ideal 
Nyquist pulse, in that we retain three terms of the summation on the left-hand side of 
(8.15) and restrict the frequency band of interest to [-W, W], as shown by 

P(f) +P(f~2W) +P(f+ 2W) = ~W<f<W 

where, on the right-hand side, we have set R b = I 12 W in accordance with (8.17). We may 
now devise several band-limited functions that satisfy (8.21). A particular form of P(f) 
that embodies many desirable features is provided by a raised-cosine (RC) spectrum. This 
frequency response consists of a flat portion and a roll-off portion that has a sinusoidal 
form, as shown by: 


P(f) 


2 W' 


0<l/l </, 


4 W 


1 1 + cos 


n 

2 Wa 


(I/I -/,) 


f\ - I/I < 2W-/j 


0 , 


\f\>2W-f x 


In (8.22), we have introduced a new frequency /) and a dimensionless parameter a, which 
are related by 


The parameter a is commonly called the roll-off factor, it indicates the excess bandwidth 
over the ideal solution, W. Specifically, the new transmission bandwidth is defined by 

B t = 2 W-f x 
= W( 1 + a) 

The frequency response P(f), normalized by multiplying it by the factor 2 W, is plotted in 
Figure 8.6a for a = 0, 0.5, and 1. We see that for a = 0.5 or 1, the frequency response P(f ) 
rolls off gradually compared with the ideal Nyquist pulse (i.e., a = 0) and it is therefore 
easier to implement in practice. This roll-off is cosine-like in shape, hence the terminology 
“RC spectrum.” Just as importantly, the P(f) exhibits odd symmetry with respect to the 
Nyquist bandwidth W, which makes it possible to satisfy the frequency-domain condition 
of (8.15). 
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(a) 



Responses for different roll-off factors: (a) frequency response; (b) time response. 


The time response p(t) is naturally the inverse Fourier transform of the frequency 
response P(f). Hence, transforming the P(f) defined in (8.22) into the time domain, we 
obtain 


p(t) = sinc(2VFt) 


cos(2naWt) 

9 ?2 

1 - 16 aWt 


which is plotted in Figure 8.6b for a = 0, 0.5, and 1. 
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The time response p(t) consists of the product of two factors: the factor sinc(2VD) 
characterizing the ideal Nyquist pulse and a second factor that decreases as 1/| fp for large 
1 t 1. The first factor ensures zero crossings of pit) at the desired sampling instants of time 
t = F/ h , with i equal to an integer (positive and negative). The second factor reduces the 
tails of the pulse considerably below those obtained from the ideal Nyquist pulse, so that 
the transmission of binary data using such pulses is relatively insensitive to sampling time 
errors. In fact, for a = 1 we have the most gradual roll-off, in that the amplitudes of the 
oscillatory tails of pit) are smallest. Thus, the amount of ISI resulting from timing error 
decreases as the roll-off factor a is increased from zero to unity. 

The special case with a = 1 (i.e., f = 0) is known as the full-cosine roll-off 
characteristic, for which the frequency response of (8.22) simplifies to 


P(f) 


< 4w_ 


1 + cos 



l 0 , 


0<\f\<2W 
|/| > 2W 


Correspondingly, the time response p(t) simplifies to 


Pit) = 


sinc(4RT) 
^ 2 

1 - 16 Wt 


The time response of (8.27) exhibits two interesting properties: 


At t = +T h /2 = +1/4 W, we have p(t) = 0.5; that is, the pulse width measured at half 
amplitude is exactly equal to the bit duration 7 b . 

There are zero crossings at t = ±37 b /2, ±57 b /2, ... in addition to the usual zero 
crossings at the sampling times t = ±T b , ±2 T b , .... 

These two properties are extremely useful in extracting timing information from the 
received signal for the purpose of synchronization. However, the price paid for this 
desirable property is the use of a channel bandwidth double that required for the ideal 
Nyquist channel for which a = 0: simply put, there is “no free lunch.” 


In this example, we use the finite-duration impulse response (FIR) filter, also referred to as 
the tapped-delay-line ( TDL ) filter, to model the raised-cosine (RC) filter; both terms are 
used interchangeably. With the FIR filter operating in the discrete-time domain, there are 
two time-scales to be considered: 

Discretization of the input signal a(t) applied to the FIR model, for which we write 



where T is the sampling period in the FIR model shown in Figure 8.7. The tap inputs 
in this model are denoted by a n , a n v ...,a n _ l , ..., a n _ 2? + i, a n _ o/ , which, for 
some integer /, occupies the duration 2/71 Note that the FIR model in Figure 8.7 is 
symmetric about the midpoint, a n _ ; , which satisfies the symmetric structure of the 
RC pulse. 
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Input 



TDL model of linear time-invariant system. 


Discretization of the RC pulse p(t) for which we have 



where Tj, is the bit duration. 


To model the RC pulse properly, the sampling rate of the model, l/T, must be higher than 
the bit rate, 1 /T b . It follows therefore that the integer m defined in (8.29) must be larger than 
one. In assigning a suitable value to m, we must keep in mind the tradeoff between 
modeling accuracy (requiring large m) and computational complexity (preferring small m ). 

In any event, using (8.17), (8.28), and (8.29), obtaining the product 


Wt = — 
2 m 


and then substituting this result into (8.25), we get the discretized version of RC pulse as 
shown by 


sinc(«///t) 


cos (n an/m) 

L 1 -4 « 2 (n/m)“ J 


0 , ± 1 , ± 2 , 


There are two computational difficulties encountered in the way in which the discretized 
RC pulse, p n , is defined in (8.31): 


The pulse p n goes on indefinitely with increasing n. 

The pulse is also noncausal in that the output signal y n in Figure 8.7 is produced 
before the input a„ is applied to the FIR model. 


To overcome difficulty 1, we truncate the sequence p n such that it occupies a finite dura- 
tion 2 IT for some prescribed integer /, which is indeed what has been done in Figure 8.8. 
To mitigate the non-causality problem 2, with T > 7 h , the ratio n/m must be replaced by 
(n/m) - /. In so doing, the truncated causal RC pulse assumes the following modified form: 


P 


n 


< sincf— - 1) 

0 , 


cos 



1 -4 a 2 (--l 


-l<n<l 


otherwise 
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where the value assigned to the integer / is determined by how long the truncated sequence 
{ p } , is desired to be. 

With the desired formula of (8.32) for the FIR model of the RC pulse p(t) at hand, 
Figure 8.8 plots this formula for the following specifications: 

Sampling of the RC pulse, T= 10 
Bit duration of the RC pulse, 7^ = 1 
Number of the FIR samples per bit, m = 10 
Roll-off factor of the RC pulse, a = 0.32 
Two noteworthy points that follow from Figure 8.8: 

The truncated causal RC pulse p n of length 2/ - 10 is symmetric about the 
midpoint, n = 5. 

The p n is exactly zero at integer multiples of the bit duration 7 h . 

Both points reaffirm exactly what we know and therefore expect about the RC pulse p(t) 
plotted in Figure 8.6b. 



Discretized RC pulse, computed using the TDL. 


Square-Root Raised-Cosine Spectrum 


A more sophisticated form of pulse shaping uses the square-root raised-cosine (SRRC) 
spectrum rather than the conventional RC spectrum of (8.22). Specifically, the spectrum 
of the basic pulse is now defined by the square root of the right-hand side of this equation. 
Thus, using the trigonometric identity 

cos~0 = ^( 1 + cos2$) 
where, for the problem at hand, the angle 
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To avoid confusion, we use G(f ) as the symbol for the SRRC spectrum, and so we may write 


G(f) 


J2W 

0 , 


0 < I/I </, 

/l ^ I/I < 2W-/j 

|/| >2W-f x 


where, as before, the roll-off factor a is defined in terms of the frequency parameter /j and 
the bandwidth W as in (8.23). 

If, now, the transmitter includes a pre-modulation filter with the transfer function 
defined in (8.33) and the receiver includes an identical post-modulation filter, then under 
ideal conditions the overall pulse waveform will experience the squared spectrum G 2 (f), 
which is the regular RC spectrum. In effect, by adopting the SRRC spectrum G(/) of 
(8.33) for pulse shaping, we would be working with G~(/) = P(f) in an overall 
transmitter-receiver sense. On this basis, we find that in wireless communications, for 
example, if the channel is affected by both fading and AWGN and the pulse-shape filtering 
is partitioned equally between the transmitter and the receiver in the manner described 
herein, then effectively the receiver would maximize the output SNR at the sampling 
instants. 

The inverse Fourier transform of (8.33) defines the SRRC shaping pulse: 

J2W [sin[27iW(l - a)t] 4 a ro | 

g(t) = — trrrrr — + — cos[2itW(l + a)t ] 

1 -(SaWt) 2 { 2nWt 71 J 

The important point to note here is the fact that the SRRC shaping pulse g(t) of (8.34) is 
radically different from the conventional RC shaping pulse of (8.25). In particular, the 
new shaping pulse has the distinct property of satisfying the orthogonality constraint 
under T-shifts, described by 

g{t)g{t- nT) df = 0 for n = ±1, ±2, ... 

where T is the symbol duration. Yet, the new pulse g(t) has exactly the same excess 
bandwidth as the conventional RC pulse. 

It is also important to note, however, that despite the added property of orthogonality, 
the SRRC shaping pulse of (8.34) lacks the zero-crossing property of the conventional RC 
shaping pulse defined in (8.25). 

Figure 8.9a plots the SRRC spectrum G(/) for the roll-off factor a = 0, 0.5, 1; the 
corresponding time-domain plots are shown in Figure 8.9b. These plots are naturally 
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different from those of Figure 8.6 for nonzero a. The following example contrasts the 
waveform of a specific binary sequence using the SRRC shaping pulse with the 
corresponding waveform using the regular RC shaping pulse. 



Normalized frequency, f/W 
(a) 



(a) G(/) for SRRC spectrum, (b) g(t) for SRRC pulse. 


Square-Root Raised-Cosine Spectrum 
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Pulse Shaping Comparison Between SRRC and RC 

Using the SRRC shaping pulse g(t) of (8.34) with roll-off factor a = 0.5, the requirement 
is to plot the waveform for the binary sequence 01100 and compare it with the 
corresponding waveform obtained by using the conventional RC shaping pulse p(t) of 
(8.25) with the same roll-off factor. 

Using the SRRC pulse g(t) of (8.34) with a multiplying plus sign for binary symbol 1 and 
multiplying minus sign for binary symbol 0, we get the dashed pulse train shown in Figure 
8.10 for the sequence 01100. The solid pulse train shown in the figure corresponds to the use 
of the conventional RC pulse p(t) of (8.25). The figure clearly shows that the SRRC 
waveform occupies a larger dynamic range than the conventional RC waveform: a feature 
that distinguishes one from the other. 



Two pulse trains for the sequence 01100, one using regular RC pulse (solid 
line), and the other using an SRRC pulse (dashed line). 


FIR Modeling of the Square-Root-Raised-Cosine Pulse 

In this example, we study FIR modeling of the SRRC pulse described in (8.34). To be 
specific, we follow a procedure similar to that used for the RC pulse g(t) in Example 1, 
taking care of the issues of truncation and noncausality. This is done by discretizing the 
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SRRC pulse, g(t), and substituting the dimensionless parameter, (n/m) - L for Wt in 
(8.34). In so doing we obtain the following sequence 


sin 


7t( 1 - a) — - / 
m 


+ cos 


Sn 


4a L 


4al 


7t( 1 + a)[ — - / 


■*jT b 

0, 


1 - 16« 2 ( — - / 


-l< n< n 


otherwise 


Since, by definition, the Fourier transform of the SRRC pulse, g(t), is equal to the square 
root of the Fourier transform of the RC pulse p(t), we may make the following statement: 


We say “essentially” here on account of the truncation applied to both (8.32) and (8.36). In 
practice, when using the SRRC pulse for “ISI-free” baseband data transmission across a 
band-limited channel, one FIR filter would be placed in the transmitter and the other 
would be in the receiver. 

To conclude this example. Figure 8.11a plots the SRRC sequence g n of (8.36) for the 
same set of values used for the RC sequence p n in Figure 8.8. Figure 8.11b displays the 
result of convolving the sequence in part a with g n , which is, itself. 
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(a) Discretized SRRC pulse, computed using FIR modeling. 

(b) Discretized pulse resulting from the convolution of the pulse in part a with itself. 
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Two points are noteworthy from Figure 8.11: 

The zero-crossings of the SRRC sequence g n do not occur at integer multiples of the 
bit duration T h , which is to be expected. 

The sequence plotted in Figure 8.11b is essentially equivalent to the RC sequence 
p n , the zero-crossings of which do occur at integer multiples of the bit duration, and 
so they should. 


Post-Processing Techniques: The Eye Pattern 


The study of signaling over band-limited channels would be incomplete without discussing 
the idea of post-processing, the essence of which is to manipulate a given set of data so as 
to provide a visual interpretation of the data rather than just numerical listing of the data. 
For an illustrative example, consider the formulas for the BER of digital modulation 
schemes operating over an AWGN channel, which were summarized in Table 7.7 of 
Chapter 7. The graphical plots of the schemes, shown in Figure 7.47, provide an immediate 
comparison on how these different modulation schemes compete with each other in terms 
of performance measured on the basis of their respective BERs for varying E^JNq. In other 
words, there is much to be gained from graphical plots that are most conveniently made 
possible by computation. 

What we have in mind in this section, however, is the description of a commonly used 
post-processor, namely eye patterns, which are particularly suited for the experimental 
study of digital communication systems. 

The eye pattern, also referred to as the eye diagram, is produced by the synchronized 
superposition of (as many as possible) successive symbol intervals of the distorted 
waveform appearing at the output of the receive fdter prior to thresholding. As an 
illustrative example, consider the distorted, but noise-free, waveform shown in part a of 
Figure 8.12. Part b of the figure displays the corresponding synchronized superposition of 
the waveform’s eight binary symbol intervals. The resulting display is called an “eye 
pattern” because of its resemblance to a human eye. By the same token, the interior of the 
eye pattern is called the eye opening. 



(a) (b) 

(a) Binary data sequence and its waveform, (b) Corresponding eye pattern. 
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As long as the additive channel noise is not large, then the eye pattern is well defined 
and may, therefore, be studied experimentally on an oscilloscope. The waveform under 
study is applied to the deflection plates of the oscilloscope with its time-base circuit 
operating in a synchronized condition. From an experimental perspective, the eye pattern 
offers two compelling virtues: 

• The simplicity of eye-pattern generation. 

• The provision of a great deal of insightful information about the characteristics of 
the data transmission system. Hence, the wide use of eye patterns as a visual 
indicator of how well or poorly a data transmission system performs the task of 
transporting a data sequence across a physical channel. 


Figure 8.13 shows a generic eye pattern for distorted but noise-free binary data. The 
horizontal axis, representing time, spans the symbol interval from -7^/2 to 2T/2, where 
7 h is the bit duration. From this diagram, we may infer three timing features pertaining to 
a binary data transmission system, exemplified by a PAM system: 

Optimum sampling time. The width of the eye opening defines the time interval over 
which the distorted binary waveform appearing at the output of the receive filter in 
the PAM system can be uniformly sampled without decision errors. Clearly, the 
optimum sampling time is the time at which the eye opening is at its widest. 
Zero-crossing jitter. In practice, the timing signal (for synchronizing the receiver to 
the transmitter) is extracted from the zero-crossings of the waveform that appears at 
the receive-filter output. In such a form of synchronization, there will always be 
irregularities in the zero-crossings, which, in turn, give rise to jitter and, therefore, 
nonoptimum sampling times. 

Timing sensitivity. Another timing-related feature is the sensitivity of the PAM 
system to timing errors. This sensitivity is determined by the rate at which the eye 
pattern is closed as the sampling time is varied. 

Figure 8.13 indicates how these three timing features of the system (and other insightful 
attributes) can be measured from the eye pattern. 


Best sampling 
time 

I 

Distortion at sampling time 



which the wave is 
best sampled 

Interpretation of the eye pattern for a baseband binary data transmission system. 
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Hereafter, we assume that the ideal signal amplitude is scaled to occupy the range from -1 
to +1. We then find that, in the absence of channel noise, the eye opening assumes two 
extreme values: 

An eye opening of unity, which corresponds to zero ISI. 

An eye opening of zero, which corresponds to a completely closed eye pattern; this 
second extreme case occurs when the effect of intersymbol interference is severe 
enough for some upper traces in the eye pattern to cross with its lower traces. 

It is indeed possible for the receiver to make decision errors even when the channel is 
noise free. Typically, an eye opening of 0.5 or better is considered to yield reliable data 
transmission. 

In a noisy environment, the extent of eye opening at the optimum sampling time 
provides a measure of the operating margin over additive channel noise. This measure, as 
illustrated in Figure 8.13, is referred to as the noise margin. 

From this discussion, it is apparent that the eye opening plays an important role in 
assessing system performance; hence the need for a formal definition of the eye opening. 
To this end, we offer the following definition: 

Eye opening = I - « peak 

where D peak denotes a new criterion called the peak distortion. The point to note here is 
that peak distortion is a worst-case criterion for assessing the effect of ISI on the 
performance (i.e., error rate) of a data transmission system. The relationship between the 
eye opening and peak distortion is illustrated in Figure 8.14. With the eye opening being 
dimensionless, the peak distortion is dimensionless too. To emphasize this statement, the 
two extreme values of the eye opening translate as follows: 

Zero peak distortion, which occurs when the eye opening is unity. 

Unity peak distortion, which occurs when the eye pattern is completely closed. 



Time 


Illustrating the relationship between peak distortion and eye opening. 
Note: the ideal signal level is scaled to lie inside the range -1 to +1. 
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With this background, we offer the following definition: 


Referring to (8.10), the two components embodied in this definition are themselves 
defined as follows: 

The idealized signal component of the receive filter output is defined by the first 
term in (8.10), namely «,■, where is the ;'th encoded symbol and unit transmitted 
signal energy per bit. 

The intersymbol interference is defined by the second term, namely 

00 

X a kPi-k 

k = -oo 
i 

where Pj _ ^ stands for the term p[(i - k)T,]. The maximum value of this summation 
occurs when each encoded symbol a k has the same algebraic sign as p t - k. Therefore, 

00 

Maximum ISI = X \ p <-k\ 

k = -oo 
k*i 

Hence, invoking the definition of peak distortion, we get the desired formula: 

00 

^peak ~ X r * ~ ^1 

k = —oo 
k =£ i 

where = 1 for all i = k. Note that, by involving the assumption of a signal amplitude 
from -1 to +1, we have scaled the transmitted signal energy for a binary symbol to be 
unity. 

By its very nature, the peak distortion is a worst-case criterion for data transmission 
over a noisy channel. The eye opening specifies the smallest possible noise margin. 


By definition, an M- ary data transmission system uses M encoded symbols in the 
transmitter and M— 1 thresholds in the receiver. Correspondingly, the eye pattern for an 
M-ary data transmission system contains M— 1 eye openings stacked vertically one on 
top of the other. The thresholds are defined by the amplitude-transition levels as we move 
up from one eye opening to the adjacent eye opening. When the encoded symbols are all 
equiprobable, the thresholds will be equidistant from each other. 

In a strictly linear data transmission system with truly transmitted random data 
sequences, all the M — 1 eye openings would be identical. In practice, however, it is often 
possible to find asymmetries in the eye pattern of an M- ary data transmission system, 
which are caused by nonlinearities in the communication channel or other distortion- 
sensitive parts of the system. 
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Eye Patterns for Binary and Quaternary Systems 

Figure 8.15a and b depict the eye patterns for a baseband PAM transmission system using 
M = 2 and M = 4 , respectively. The channel has no bandwidth limitation and the 
source symbols used are obtained from a random number generator. An RC pulse is used 
in both cases. The system parameters used for the generation of these eye patterns are a bit 
rate of 1Hz and roll-off factor a = 0.5 . For the binary case of M = 2 in Figure 8.15a, 



Eye diagrams of received signal with no bandwidth limitation: (a) M = 2; (b) M = 4. 
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the symbol duration T and the bit duration T b are the same, with 7 h = Is. For the case of 
M = 4 in Figure 8.15b we have T = 7 h log 2 M = 27 h . In both cases we see that the eyes are 
open, indicating perfectly reliable operation of the system, perfect in the sense that the ISI 
is zero. 

Figure 8.16a and b show the eye patterns for these two baseband-pulse transmission 
systems using the same system parameters as before, but this time under a bandwidth- 



Eye diagrams of received signal, using a bandwidth-limited channel: (a) M = 2; (b) M = 4. 
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limited condition. Specifically, the channel is now modeled by a low-pass Butterworth 
filter, whose frequency response is defined by 


\m\ 


i 

2N 

i + (///<>) 


where N is the order of the filter, and / 0 is the 3-dB cutoff frequency of the filter. For the 
results displayed in Figure 8.16, the following filter parameter values were used: 

N = 3, and f Q = 0.6 Hz for binary PAM 
N = 3, and f Q = 0.3 Hz for4-PAM 

With the roll-off factor a = 0.5 and Nyquist bandwidth W = 0.5 Hz, for binary PAM, 
the use of (8.24) defines the transmission bandwidth of the PAM transmission system to be 

B t = 0.5(1 +0.5) = 0.75 Hz 


Although the channel bandwidth cutoff frequency is greater than absolutely necessary, its 
effect on the passband is observed in a decrease in the size of the eye opening. Instead of 
the distinct values at time r = Is, shown in Figure 8.15a and b, now there is a blurred 
region. If the channel bandwidth were to be reduced further, the eye would close even 
more until finally no distinct eye opening would be recognizable. 


Adaptive Equalization 


In this section we develop a simple and yet effective algorithm for the adaptive equaliza- 
tion of a linear channel of unknown characteristics. Figure 8.17 shows the structure of an 
adaptive synchronous equalizer, which incorporates the matched filtering action. The 
algorithm used to adjust the equalizer coefficients assumes the availability of a desired 
response. One’s first reaction to the availability of a replica of the transmitted signal is: If 
such a signal is available at the receiver, why do we need adaptive equalization? To answer 
this question, we first note that a typical telephone channel changes little during an aver- 
age data call. Accordingly, prior to data transmission, the equalizer is adjusted under the 



Block diagram of adaptive equalizer using an adjustable TDL filter. 
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guidance of a training sequence transmitted through the channel. A synchronized version 
of this training sequence is generated at the receiver, where (after a time shift equal to the 
transmission delay through the channel) it is applied to the equalizer as the desired 
response. A training sequence commonly used in practice is the pseudonoise ( PN ) 
sequence , which consists of a deterministic periodic sequence with noise-like characteris- 
tics. Two identical PN sequence generators are used, one at the transmitter and the other at 
the receiver. When the training process is completed, the PN sequence generator is 
switched off and the adaptive equalizer is ready for normal data transmission. A detailed 
description of PN sequence generators is presented in Appendix J. 


To simplify notational matters, we let 

x n = x(nT) 
y n = y(nT) 

Then, the output y n of the tapped-delay-line (TDL) equalizer in response to the input 
sequence {x n } is defined by the discrete convolution sum (see Figure 8.17) 

N 

y n = X W k X n-k 

k= 0 

where is the weight at the kth tap and N + 1 is the total number of taps. The tap weights 
constitute the adaptive equalizer coefficients. We assume that the input sequence x n has 
finite energy. We have used a notation for the equalizer weights in Figure 8.17 that is 
different from the corresponding notation in Figure 6.17 to emphasize the fact that the 
equalizer in Figure 8.17 also incorporates matched filtering. 

The adaptation may be achieved by observing the error between the desired pulse shape 
and the actual pulse shape at the equalizer output, measured at the sampling instants, and 
then using this error to estimate the direction in which the tap weights of the equalizer 
should be changed so as to approach an optimum set of values. For the adaptation, we may 
use a criterion based on minimizing the peak distortion, defined as the worst-case 
intersymbol interference at the output of the equalizer. Flowever, the equalizer so designed 
is optimum only when the peak distortion at its input is less than 100% (i.e., the 
intersymbol interference is not too severe). A better approach is to use a mean-square error 
criterion, which is more general in application; also, an adaptive equalizer based on the 
mean-square error (MSE) criterion appears to be less sensitive to timing perturbations 
than one based on the peak-distortion criterion. Accordingly, in what follows we use the 
MSE criterion to derive the adaptive equalization algorithm. 

Let a n denote the desired response defined as the polar representation of the nth 
transmitted binary symbol. Let e n denote the error signal defined as the difference 
between the desired response a n and the actual response y n of the equalizer, as shown by 

e n = a n~y n 

In the least-mean-square ( LMS ) algorithm for adaptive equalization, the error signal e n 
actuates the adjustments applied to the individual tap weights of the equalizer as the 
algorithm proceeds from one iteration to the next. A derivation of the LMS algorithm for 
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adaptive prediction was presented in Section 6.7 of Chapter 6. Recasting (6.85) into its 
most general form, we may restate the formula for the LMS algorithm in words as follows: 

7 Updated value '> _ { Old value of VI Step-size A f Input signal applied^ f Error \ 

V of fcth tap weight J l kth tap weight/ v parameter A to fth tap weight A signal/ 

Let // denote the step-size parameter. From Figure 8. 17 we see that the input signal applied 
to the kxh tap weight at time step n is x„ _ Hence, using w k (n) as the old value of the &th 
tap weight at time step n, the updated value of this tap weight at time step n + 1 is, in light 
of (8.43), defined by 

'A n+l= W k, n + /"n - k e n’ k = 0, 1, N 

where 

N 

e = a — A 1 w i x , 
n n k,n n-k 

k = 0 

These two equations constitute the LMS algorithm for adaptive equalization. 

We may simplify the formulation of the LMS algorithm using matrix notation. Let the 
(N + 1 )-by- 1 vector x n denote the tap inputs of the equalizer: 

x n ~ [■*«’ •••’ X n-N+ 1> X n-N ] 

where the superscript T denotes matrix transposition. Correspondingly, let the (N + l)-by-l 
vector w„ denote the tap weights of the equalizer: 

T 

w„ = [w 0 n ,w l n , ...,w N n ] 

We may then use matrix notation to recast the discrete convolution sum of (8.41) in the 
compact form 

y n = A"), 

T - 

where x /? w„ is referred to as the inner product of the vectors x n and vv„ . We may now 
summarize the LMS algorithm for adaptive equalization as follows: 

Initialize the algorithm by setting wi = 0 (i.e., set all the tap weights of the 
equalizer to zero at n = 1, which corresponds to time t = T. 

For n = 1,2, . . . , compute 

T - 


w„ + i= W n +P e „x n 

where /u is the step- size parameter. 

Continue the iterative computation until the equalizer reaches a “steady state,” by 
which we mean that the actual mean-square error of the equalizer essentially reaches 
a constant value. 

The LMS algorithm is an example of a feedback system, as illustrated in the block 
diagram of Figure 8.18, which pertains to the kth filter coefficient. It is therefore possible 
for the algorithm to diverge (i.e., for the adaptive equalizer to become unstable). 
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Signal-flow graph representation of 
the LMS algorithm involving the Mi tap weight. 


Unfortunately, the convergence behavior of the LMS algorithm is difficult to analyze. 
Nevertheless, provided that the step-size parameter /j is assigned a small value, we find 
that after a large number of iterations the behavior of the LMS algorithm is roughly similar 
to that of the steepest-descent algorithm (discussed in Chapter 6), which uses the actual 
gradient rather than a noisy estimate for the computation of the tap weights. 


There are two modes of operation for an adaptive equalizer, namely the training mode and 
decision-directed mode, as shown in Figure 8.19. During the training mode , a known PN 
sequence is transmitted and a synchronized version of it is generated in the receiver, where 
(after a time shift equal to the transmission delay) it is applied to the adaptive equalizer as 
the desired response; the tap weights of the equalizer are thereby adjusted in accordance 
with the LMS algorithm. 

When the training process is completed, the adaptive equalizer is switched to its second 
mode of operation: the decision-directed mode. In this mode of operation, the error signal 
is defined by 

e = a — y 

n n ■'n 



Illustrating the two operating modes of an adaptive equalizer: for the training mode, the 
switch is in position 1 ; for the tracking mode, it is moved to position 2. 
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where y„ is the equalizer output at time t = nT and a n is the final (not necessarily) correct 
estimate of the transmitted symbol a n . Now, in normal operation the decisions made by 
the receiver are correct with high probability. This means that the error estimates are 
correct most of the time, thereby permitting the adaptive equalizer to operate satisfactorily. 
Furthermore, an adaptive equalizer operating in a decision-directed mode is able to track 
relatively slow variations in channel characteristics. 

It turns out that the larger the step-size parameter // is, the faster the tracking capability 
of the adaptive equalizer. However, a large step-size parameter // may result in an 
unacceptably high excess mean-square error, defined as that part of the mean-square value 
of the error signal in excess of the minimum attainable value, which results when the tap 
weights are at their optimum settings. We therefore find that, in practice, the choice of a 
suitable value for the step-size parameter /t involves making a compromise between fast 
tracking and reducing the excess mean-square error. 


To develop further insight into adaptive equalization, consider a baseband channel with 
impulse response denoted in its sampled form by the sequence {h n }, where h n = h(nT). 
The response of this channel to an input sequence {x n }, in the absence of noise, is given by 
the discrete convolution sum 

y n = Yj' l k X n-k 
k 

= h 0 x n + X h k X n-k+ X h k X n-k 
k < 0 k > 0 

The first term of (8.50) represents the desired data symbol. The second term is due to the 
precursors of the channel impulse response that occur before the main sample h 0 
associated with the desired data symbol. The third term is due to the postcursors of the 
channel impulse response that occur after the main sample h 0 . The precursors and 
postcursors of a channel impulse response are illustrated in Figure 8.20. The idea of 
decision-feedback equalization is to use data decisions made on the basis of precursors of 



Impulse response of a discrete-time channel, depicting 
the precursors and postcursors. 
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Block diagram of decision-feedback equalizer. 


the channel impulse response to take care of the postcursors; for the idea to work, 
however, the decisions would obviously have to be correct for the DFE to function 
properly most of the time. 

A DFE consists of a feedforward section, a feedback section, and a decision device 
connected together as shown in Figure 8.21. The feedforward section consists of a TDL 
filter whose taps are spaced at the reciprocal of the signaling rate. The data sequence to be 
equalized is applied to this section. The feedback section consists of another TDL filter 
whose taps are also spaced at the reciprocal of the signaling rate. The input applied to the 
feedback section consists of the decisions made on previously detected symbols of the 
input sequence. The function of the feedback section is to subtract out that portion of the 
intersymbol interference produced by previously detected symbols from the estimates of 
future samples. 

Note that the inclusion of the decision device in the feedback loop makes the equalizer 
intrinsically nonlinear and, therefore, more difficult to analyze than an ordinary LMS 
equalizer. Nevertheless, the mean-square error criterion can be used to obtain a 
mathematically tractable optimization of a DFE. Indeed, the LMS algorithm can be used 
to jointly adapt both the feedforward tap weights and the feedback tap weights based on a 
common error signal. 

Broadband Backbone Data Network: Signaling over Multiple 
Baseband Channels 


Up to this point in the chapter, the discussion has focused on signaling over a single band- 
limited channel and related issues such as adaptive equalization. In order to set the stage for 
the rest of the chapter devoted to signaling over a linear broadband channel purposely parti- 
tioned into a set of subchannels , this section on the broadband backbone data network 
(PSTN) is intended to provide a transition from the first part of the chapter to the second part. 

The PSTN was originally built to provide a ubiquitous structure for the digital 
transmission of voice signals using PCM, which was discussed previously in Chapter 6. 
As such, traditionally, the PSTN has been viewed as an analog network. In reality, 
however, the PSTN has evolved into an almost entirely digital network. We say “almost 
entirely” because the analog refers to the local network , which stands for short- 
connections from a home to the central office. 

For many decades past, data transmission over the PSTN relied on the use of modems', 
the term “modem” is a contraction of modulator-demodulator. Despite the enormous 
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effort that was put into the design of modems, they could not cope with the ever-increasing 
rate of data transmission. This situation prevailed until the advent of digital subscriber line 
(DSL) technology in the 1990s. The inquisitive reader may well ask the question: How was 
it that the modem theorists and designers got it wrong while the DSL theorists and 
designers got it right? Unfortunately, in the development of modems, the telephone 
channel was treated as one whole entity. On the other hand, the development of DSL 
abandoned the traditional approach by viewing the telephone channel as a conglomeration 
of subchannels extending over a wide frequency band and operating in parallel, and with 
each subchannel treated as a narrowband channel, thereby exploiting Shannon’s 
information capacity law in a much more effective manner. 

It is therefore not surprising that the DSL technology has converted an ordinary 
telephone line into a broadband communication link, so much so that we may now view 
the PSTN effectively as a broadband backbone data network, which is being widely used 
all over the world. The data consist of digital signals generated by computers or Internet 
service providers (ISPs). Most importantly, the deployment of DSL technology has 
literally made it possible to increase the rate of data transmission across a telephone 
channel by orders of magnitude compared with the old modems. This transition from 
modem to DSL technology is indeed an impressive engineering accomplishment, which 
resulted from “thinking outside the box.” 

With this brief historical account, it is apropos that we devote the rest of the chapter to 
the underlying theory of the widely used DSL technology. 

Digital Subscriber Lines 


The term DSL is commonly used to refer to a family of different technologies that operate 
over a local loop less than 1.5 km to provide for digital signal transmission between a user 
terminal (e.g., computer) and the central office (CO) of a telephone company. Through the 
CO, the user is connected directly to the so-called broadband backbone data network, 
whereby transmission is maintained in the digital domain. In the course of transmission, 
the digital signal is switched and routed at regular intervals. Figure 8.22 is a schematic 
diagram illustrating that typically the data rate upstream (i.e., in the direction of the ISP) is 
lower than the data rate downstream (i.e., in the direction of the user). It is for this reason 
that the DSL is said to be asymmetric, hence the acronym ADSL. 

The twisted wire-pair used in the local loop, the only analog part of the data 
transmission system as remarked earlier, is inductively loaded. Specifically, extra 
inductance is purposely supplied by local coils, which are inserted at regular intervals 
across the wire-pair. This addition is made in order to produce a fairly flat frequency 
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Block diagram depicting the operational environment of DSL. 
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Transmit 
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Local 

loop 


(a) 


(b) 


(a) Illustrating the different band allocations for an FDM-based ADSL system, 
(b) Block diagram of splitter performing the function of a multiplexer or demultiplexer. 
Note: both filters in the splitter are bidirectional filters. 


response across the effective voice band. However, the improvement so gained for the 
transmission of voice signals is attained at the expense of continually increasing 
attenuation at frequencies higher than 3.4 kHz. Figure 8.23a illustrates the two different 
frequency bands allocated to a frequency-division multiplexive (FDM)-based ADSL; the 
way in which two filters, one high-pass and the other low-pass, are used to connect the 
DSL to the local loop is shown in Figure 8.23b. 

With access to the wide band represented by frequencies higher than 3.6 kHz, the DSL 
uses discrete multicarrier transmission (DMT) techniques to convert the twisted wire-pair 
in the local loop into a broadband communication link; the two terms “multichannel” and 
“multicarrier” are used interchangeably. The net result is that data rates of 1.5 to 9.0 Mbps 
downstream in a bandwidth of up to 1 MHz and over a distance of 2.7 to 5.5 km. Very 
high-bit-rate digital subscriber lines (VDSLs) do even better, supporting data rates of 13 to 
52 Mbps downstream in a bandwidth of up to 30 MHz and over a distance of 0.3 to 1.5 km. 
These numbers indicate that the data rates attainable by DSL technology depend on both 
bandwidth and distance, and the technology continues to improve. 

The basic idea behind DMT is rooted in a commonly used engineering paradigm: 


According to this paradigm, a difficult problem is solved by dividing it into a number of 
simpler problems and then combining the solutions to those simple problems. In the 
context of our present discussion, the difficult problem is that of data transmission over a 
wideband channel with severe intersymbol interference, and the simpler problems are 
exemplified by data transmission over relatively straightforward AWGN channels. We 
may thus describe the essence of DMT theory , as follows: 
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Naturally, the overall data rate is the sum of the individual data rates over the subchannels 
designed to operate in parallel: this new way of thinking on signaling over wideband 
channels is entirely different from the approach described in the first part of the chapter, in 
that it builds on ideas described in Chapter 5 on Shannon’s information theory and in 
Chapter 7 on signaling over AWGN channels. 

Capacity of AWGN Channel Revisited 


At the heart of discrete multichannel data transmission theory is Shannon ’s information 
capacity law , discussed in Chapter 5 on information theory. According to this law, the 
capacity of an AWGN channel (free from ISI) is defined by 

C = B log 2 ( 1 + SNR) bits/s 


where B is the channel bandwidth in hertz and SNR is measured at the channel output. 
Equation (8.51) teaches us that, for a given SNR, we can transmit data over an AWGN 
channel of bandwidth B at the maximum rate of B bit/s with arbitrarily small probability of 
error, provided that we employ an encoding system of sufficiently high complexity. 
Equivalently, we may express the capacity C in bits per transmission of channel use as 


C = ^log 2 ( 1 + SNR) 


bits per transmission 


In practice, we usually find that a physically realizable encoding system must transmit data 
at a rate R less than the maximum possible rate C for it to be reliable. For an implementable 
system operating at low enough probability of symbol error, we thus need to introduce an 
SNR gap or just gap, denoted by E. The gap is a function of the permissible probability of 
symbol error P e and the encoding system of interest. It provides a measure of the 
“efficiency” of an encoding system with respect to the ideal transmission system of (8.52). 
With C denoting the capacity of the ideal encoding system and R denoting the capacity of 
the corresponding implementable encoding system, the gap is defined by 

0 2C 

r 


l 


2 2R - 1 

SNR 
„2 R , 


Rearranging (8.53) with R as the focus of interest, we may write 

R = ~log-)^l + bits per transmission 

For an encoded PAM or QAM operating at P e = 1 0 ( \ for example, the gap E is constant at 
8.8 dB. Through the use of codes (e.g., trellis codes to be discussed in Chapter 10), the gap 
E may be reduced to as low as 1 dB. ^ 

Let P denote the transmitted signal power and a denote the channel noise variance 
measured over the bandwidth B. The SNR is therefore 


SNR = — 

2 

(7 
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where 

a = N 0 B 

We may thus finally define the attainable data rate as 


R = 

r a 


bits per transmission 


With this modified version of Shannon’s information capacity law at hand, we are ready to 
describe discrete multichannel modulation in quantitative terms. 


Partitioning Continuous-Time Channel into a Set 
of Subchannels 


To be specific in practical terms, consider a linear wideband channel (e.g., twisted wire- 
pair) with an arbitrary frequency response H(f). Let the magnitude response of the 
channel, denoted by |//(/)|, be approximated by a staircase function as illustrated in Figure 
8.24, with Af denoting the width of each frequency step (i.e., subchannel). In the limit, as 
the frequency increment Af approaches zero, the staircase approximation of the channel 
approaches the actual //(/). Along each step of the approximation, the channel may be 
assumed to operate as an AWGN channel free from intersymbol interference. The problem 
of transmitting a single wideband signal is thereby transformed into the transmission of a 
set of narrowband orthogonal signals. Each orthogonal narrowband signal, with its own 
carrier, is generated using a spectrally efficient modulation technique such as M - ary QAM, 
with AWGN being essentially the only primary source of transmission impairment. This 
scenario, in turn, means that data transmission over each subchannel of bandwidth A/' can 
be optimized by invoking a modified form of Shannon’s information capacity law, with the 
optimization of each subchannel being performed independently of all the others. Thus, in 
practical signal-processing terms, we may make the following statement: 


Although the resulting complexity of a DMT system so described is indeed high for a 
large number of subchannels, implementation of the entire system can be accomplished in 
a cost-effective manner through the combined use of efficient digital signal-processing 
algorithms and very-large-scale integration technology. 

Figure 8.25 shows a block diagram of the DMT system in its most basic form. The 
system configured here uses QAM, whose choice is justified by virtue of its spectral 
efficiency. The incoming binary data stream is first applied to a demultiplexer (not shown in 
the figure), thereby producing a set of N substreams. Each substream represents a sequence 
of two-element subsymbols, which, for the symbol interval 0 < t < T, is denoted by 

(■ a n ’ b n )> n = 1, 2, ...,1V 

where a n and b n are element values along the two coordinates of subchannel n. 
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W)\ 


Staircase 



Staircase approximation of an arbitrary magnitude response of 
a channel, |H(/)|; only positive-frequency portion of the response is shown. 


Correspondingly, the passband basis functions of the quadrature-amplitude modulators 
are defined by the following function pairs: 

{ </>(t)cos(2nf n t), <j>(t) sin(2jt/„r) }, n = 1,2 

The carrier frequency/,, of the nth modulator described in (8.56) is an integer multiple of 
the symbol rate 1 IT, as shown by 


/„ = n = 1,2, ...,N 


and the low-pass function <f>(t), common to all the subchannels, is the sine function 

—oo < t < oo 


12 . 1 

't\ 

- sine 


\IT [ 

,t) 


The passband basis functions defined here have the following desirable properties, whose 
proofs are presented as an end-of-chapter problem. 


This orthogonal relationship provides the basis for formulating the signal constellation for 
each of the N modulators in the form of a squared lattice. 
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Equation (8.60) provides the mathematical basis for ensuring that the N modulator- 
demodulator pairs operate independently of each other. 


Thus, in light of these three properties, the original wideband channel is partitioned into an 
ideal setting of independent subchannels operating in continuous time. 
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Block diagram of DMT system. 
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Figure 8.25 also includes the corresponding structure of the receiver. It consists of a 
bank of N coherent detectors, with the channel output being simultaneously applied to the 
detector inputs, operating in parallel. Each detector is supplied with a locally generated 
pair of quadrature-modulated sine functions operating in synchrony with the pair of 
passband basis function applied to the corresponding modulator in the transmitter. 

It is possible for each subchannel to have some residual ISI. However, as the number of 
subchannels N approaches infinity, the ISI disappears for all practical purposes. From a 
theoretical perspective, we find that, for a sufficiently large N, the bank of coherent 
detectors in Figure 8.25 operates as maximum likelihood detectors , operating 
independently of each other and on a subsymbol-by-subsymbol basis. (Maximum 
likelihood detection was discussed in Chapter 7.) 

To define the detector outputs in response to the input subsymbols, we find it 
convenient to use complex notation. Let A n denote the subsymbol applied to the nth 
modulator during the symbol interval 0 < t < T, as shown by 

A n = a n + ) b n’ n=l,2,...,N 
The corresponding detector output is expressed as follows: 

Y n = H n A n + W n’ n=l,2,...,N 

where H n is the complex-valued frequency response of the channel evaluated at the 
subchannel carrier frequency / = /,,, that is, 

H„ = H(f n ), n = 1,2, N 

The W n in (8.62) is a complex- valued random variable produced by the channel noise 
w(t); the real and imaginary parts of W n have zero mean and variance /Vq/ 2. With 
knowledge of the measured frequency response H(f) available, we may therefore use 
(8.62) to compute a maximum likelihood estimate of the transmitted subsymbol A n . The 
estimates Ai,A 2 , ■■■,A^ so obtained are finally multiplexed to produce the overall 
estimate of the original binary data transmitted during the interval 0 < t < T. 

To summarize, for a sufficiently large N, we may implement the receiver as an optimum 
maximum likelihood detector that operates as N subsymbol-by-subsymbol detectors. The 
rationale for building a maximum likelihood receiver in such a simple way is motivated by 
the following property: 


In the DMT system of Figure 8.25, each subchannel is characterized by an SNR of its 
own. It would be highly desirable, therefore, to derive a single measure for the 
performance of the entire system in Figure 8.25. 

To simplify the derivation of such a measure, we assume that all of the subchannels in 
Figure 8.25 are represented by one-dimensional constellations. Then, using the modified 
Shannon information capacity law of (8.55), the channel capacity of the entire system is 
successively expressed as follows: 
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R = 


I N 

- y r 

N ' 

n = 1 



^log 
2 2 


N 

f O 

n 


_n = 1 

^ r <vJ 


nl/N 


bits per transmission 


Let (SNR) overa ]] denote the overall SNR of the entire DMT system. Then, in light of 
(8.54), we may express the rate R as 

1 f (SNR) .A 

R = — log I + - J bits per transmission 

Accordingly, comparing (8.65) with (8.64) and rearranging terms, we may write 



N 

f p„Y /n 

(SNR) overall = r 

n 

1 + r^ - 1 


n = 1 

^ r <v 


2 

Assuming that the SNR, namely P / (F tr ), is large enough to ignore the two unity terms 
on the right-hand side of (8.66), we may approximate the overall SNR simply as follows: 


n (p y/A 

(SNR) overall « ;Q -"I 

, \a / 

n — I n 

which is independent of the gap T . We may thus characterize the overall system by an 
SNR that is the geometric mean of the SNRs of the individual subchannels. 

The geometric form of the SNR of (8.67) can be improved considerably by distributing 
the available transmit power among the N subchannels on a nonuniform basis. This 
objective is attained through the use of loading, which is discussed next. 


Equation (8.64) for the bit rate of the entire DMT system ignores the effect of the channel 
on system performance. To account for this effect, define 

8 n = \H(f n )\, n=\,2,...,N 

Then, assuming that the number of subchannels N is large enough, we may treat g n as a 
constant over the entire bandwidth Af assigned to subchannel n for all n. In such a case, we 
may modify the second line of (8.64) for the overall SNR of the system into 
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2N 


* = log. 


1 + 


n = 1 


g 2 p 

°n n 

r <7, 


2 2 
where the g n and F are usually fixed. The noise variance a n is A/'/V () for all n, where A/ 

is the bandwidth of each subchannel and Nq/2 is the noise power spectral density of the 

subchannel. We may therefore optimize the overall bit rate R through a proper allocation 

of the total transmit power among the various subchannels. However, for this optimization 

to be of practical value, we must maintain the total transmit power at some constant value 

denoted by P, as shown by 


I 


n = 1 


= P 


The optimization we therefore have to deal with is a constrained optimization problem, 
stated as follows: 


To solve this optimization problem, we first use the method of Lagrange multipliers to set 
up an objective function (i.e., the Lagrangian) that incorporates (8.69) and the constraint 
of (8.70) as shown by 
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where A is the Lagrange multiplier, in the second line of (8.71) the logarithm to base 2 has 
been changed to the natural logarithm written as log 2 e. Hence, differentiating the 
Lagrangian J with respect to P n , then setting the result equal to zero and finally 
rearranging terms, we get 


^ l0g 2 6 


= A 


P n + 


'X 

2 
& n 


The result of (8.72) indicates that the solution to our constrained optimization problem is 
to have 
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where K is a prescribed constant under the designer’s control. That is, the sum of the 
transmit power and the noise variance (power) scaled by the ratio T/g must be 
maintained constant for each subchannel. The process of allocating the transmit power P 
to the individual subchannels so as to maximize the bit rate of the entire multichannel 
transmission system is called loading ; this term is not to be confused with loading coils 
used in twisted wire-pairs. 


Water-Filling Interpretation of the Constrained 
Optimization Problem 


In solving the constrained optimization problem just described, the two conditions of 
(8.70) and (8.73) must both be satisfied. The optimum solution so defined has an 
interesting interpretation, as illustrated in Figure 8.26 for N = 6, assuming that the gap T 
is maintained constant over all the subchannels. To simplify the illustration in Figure 8.26, 
we have set <j~ = N 0 Af = 1; that is, the average noise power is unity for all N 
subchannels. Referring to this figure, we may now make three observations: 

With cr^ = 1 , the sum of power P n allocated to subchannel n and the scaled noise 
power T/g“ satisfies the constraint of (8.73) for four of the subchannels for a 
prescribed value of the constant K. 

The sum of power allocations to these four subchannels consumes all the available 
transmit power, maintained at the constant value P. 

The remaining two subchannels have been eliminated from consideration because 
they would each require negative power to satisfy (8.73) for the prescribed value of 
the constant K; from a physical perspective, this condition is clearly unacceptable. 
The interpretation illustrated in Figure 8.26 prompts us to refer to the optimum solution of 
(8.73), subject to the constraint of (8.70), as the water-filling solution ; the principle of water- 
filling was discussed under Shannon’s information theory in Chapter 5. This terminology 



Index of subchannel, n 

Water-filling interpretation of the loading problem. 
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follows from analogy of our optimization problem with a fixed amount of water — standing 
for transmit power — being poured into a container with a number of connected regions, each 
having a different depth — standing for noise power. In such a scenario, the water distributes 
itself in such a way that a constant water level is attained across the whole container, hence 
the term “water filling.” 

Returning to the task of how to allocate the fixed transmit power P among the various 
subchannels of a multichannel data transmission system so as to optimize the bit rate of 
the entire system, we may proceed along the following pair of steps: 

Let the total transmit power be fixed at the constant value P as in (8.70). 

o 2 

Let K denote the constant value prescribed for the sum, P + L cr /g , for all n as in 
(8.73). 

On the basis of these two steps, we may then set up the following system of simultaneous 
equations: 

P \ +P 2 + """" P N = p 

Pi-K = -r cr 2 /g\ 

p 2 -k = -r CT 2 /g\ 

P n -K = -Ta/g 2 N 

where we have a total of (N + 1) unknowns and (A' + I ) equations to solve for them. Using 

matrix notation, we may rewrite this system of N + K simultaneous equations in the 

compact form 





P 

1 1 — 1 0 


p l 


2 2 

1 0 - 0 -1 


P 2 


-Ta/g-, 

0 1 -- 0 -1 


P 3 

~ 

2 2 
-r cr /g\ 

0 0 - 1 -1 


K 


2 2 
_-r<x /g~ N _ 


Premultiplying both sides of (8.75) by the inverse of the (N + l)-by-(iV + 1) matrix on the 
left-hand side of the equation, we obtain solutions for the unknowns P j, P 2 , . . ., I\\>, and K. 
We should always find that K is positive, but it is possible for some of the P values to be 
negative. In such a situation, the negative P values are discarded as power cannot be 
negative for physical reasons. 


Linear Channel with Squared Magnitude Response 

Consider a linear channel whose squared magnitude response \H(f)\ 2 has the piecewise- 

linear form shown in Figure 8.27. To simplify the example, we have set the gap F = 1 and 

2 

the noise variance a = 1 . 
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Squared magnitude response for Example 5. 


Under this set of values, the application of (8.74) yields 

P l+ P 2 = P 

Py~ K = -1 

p 2 -k = -l/l 

where the new parameter 0 < / < 1 has been introduced to distinguish the third equation 
from the second one. Solving these three simultaneous equations for P j, P 2 , and K, we get 



P-l +1 


P+l -l 


P+ 1 + - 


Since 0 < / < 1, it follows that P\ > 0, but it is possible for P 2 to be negative. This latter 
condition can arise if 


But then Py exceeds the prescribed value of transmit power P. Therefore, it follows that, in 
this example, the only acceptable solution is to have V{P + 1) < Z < 1. Suppose then we 
have P = 10 and / = 0.1; under these two conditions the desired solution is 


K = 10.5 
Py = 9.5 
P 2 = 0.5 


The corresponding water-filling picture for the problem at hand is portrayed in Figure 8.28. 
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Water-filling profile for 
Example 5. 



DMT System using Discrete Fourier Transform 


The material presented in Sections 8.13 and 8.14 provides an insightful introduction to the 
notion of multicarrier modulation in a DMT system. In particular, the continuous-time 
channel partitioning induced by the passband (modulated) basis functions of (8.56), or equiv- 
alently (8.59) in complex terms, exhibits a highly desirable property described as follows: 


However, the DSL system so described has two practical shortcomings: 

The passband basis functions use a sine function that is nonzero for an infinite time 
interval, whereas practical considerations favor a finite observation interval. 

For a finite number of subchannels N the system is suboptimal; optimality of the 
system is assured only when N approaches infinity. 

We may overcome both shortcomings by using DMT, the basic idea of which is to 
transform a noisy wideband channel into a set of N subchannels operating in parallel. 
What makes DMT distinctive is the fact that the transformation is performed in discrete 
time as well as discrete frequency, paving the way for exploiting digital signal processing. 
Specifically, the transmitter’s input-output behavior of the entire communication system 
admits a linear matrix representation, which lends itself to implementation using the DFT. 
In the following we know from Chapter 2 on Fourier analysis of signals and systems that 
the DFT is the result of discretizing the Fourier transform both in time and frequency. 

To exploit this new approach, we first recognize that in a realistic situation the channel 
has its nonzero impulse response h(t) essentially confined to a finite interval [0, 7 h ] . So, 
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let the sequence h 0 , h j, h v denote the baseband equivalent impulse response of the 
channel sampled at the rate l/T s , with 

T b = C 1 + V ) T S 

where the role of v is to be clarified. The sampling rate 1 / 7 S is chosen to be greater than 
twice the highest frequency component of interest in accordance with the sampling 
theorem. To continue with the discrete-time description of the system, let s n = s(nT s ) 
denote a sample of the transmitted symbol s(t), w n = w(nT s ) denote a sample of the 
channel noise w(t), and x n = x(nl\) denote the corresponding sample of the channel output 
(i.e., received signal). The channel performs linear convolution on the incoming symbol 
sequence { s n } of length N to produce a channel output sequence { x n } of length N + v. 
Extension of the channel output sequence by v samples compared with the channel input 
sequence is due to the intersymbol interference produced by the channel. 

To overcome the effect of ISI, we create a cyclically extended guard interval , whereby 
each symbol sequence is preceded by a periodic extension of the sequence itself. 
Specifically, the last v samples of the symbol sequence are repeated at the beginning of the 
sequence being transmitted, as shown by 

S k = S N-K for K= l ’ 2 ’ ■■■’ V 

The condition described in (8.77) is called a cyclic prefix. The excess bandwidth factor due 
to the inclusion of the cyclic prefix is therefore v/N, where N is the number of transmitted 
samples after the guard interval. 

With the cyclic prefix in place, the matrix description of the channel now takes the new 
form 


1 

1 


h 0 

h \ 

h 0 

Vi 

h v 

0 - 

0 


1 


1 

1 

X N- 2 


0 

h 0 

h \ 

,l v-2 

h v-l 

h v 

0 


S N- 2 


W N- 2 

5-! 

1 

1 

= 

0 

0 

0 - 

0 

h Q 

hi - 

h v 


S N- v- 1 

+ 

W N- v- 1 

X N- v- 2 


h v 

0 

0 - 

0 

0 

h 0 """" 

h v-l 


S N- v— 2 


(N 

1 

1 " 

1 

X .. 
o 


h i 

/7 ? 

*3 -■ 

h v 

0 

0 - 

h 0 _ 




w 0 


In a compact way, we may describe the discrete-time representation of the channel in the 
matrix form 

x = Hs + w 

where the transmitted symbol vector s, the channel noise vector w, and the received signal 
vector x are all /V-by- 1 vectors that are respectively defined as follows: 

T 

s = [‘W_l’‘W_2’ 

T 

w = \ w N~p w N-2’ "•> w ol 

T 

x = [-'W-T x N- 2’ x 0~i 
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Discrete-time representation of s 

multichannel data transmission system. 


w 



We may thus simply depict the discrete-time representation of the channel as in Figure 
8.29. The N-by-N channel matrix H is itself defined by 


h Q h j h y _ j h y 0 — - 0 

0 h 0 h \ -■ h v-2 h v- l h v ■“ 0 


H = 


0 0 0 -- 

hy o 0 -- 


0 h 0 h l -- h v 

0 0 Iiq h v _ i 


h | /? 2 /Zj 


0 0 - 


From the definition in (8.83), we readily see that the matrix H has the following structural 
composition: 


Accordingly, the matrix H is referred to as a circulant matrix. 

Before proceeding further, it is befitting that we briefly review the DFT and its role in 
the spectral decomposition of the circulant matrix H. 


Consider the Af-by-1 vector x of (8.79). Let the DFT of the vector x be denoted by the 
IV-by-l vector 

X = [X N _ V X N _ 2 , ...,z 0 ] 
whose Alh element is defined by 

x k = \ Yj x » ex p(~j k = °’ -• N ~ { 

n = 0 

The exponential term sxp(-]2iikn/N) is the kernel of the DFT. Correspondingly, the IDFT 
(i.e., inverse DFT) of the /V-by- 1 vector X is defined by 

x n = X X k ex p(j zz = 0, 1, ..., N - 1 

n = 0 
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Equations (8.85) and (8.86) follow from discretizing the continuous-time Fourier 
transform both in time and frequency, as discussed in Chapter 2 with one difference: the 
DFT in (8.65) and its inverse in (8.66) have the same scaling factor, 1/ Jl for the purpose 
of symmetry. 

Although the DFT and IDFT appear to be similar in their mathematical formulations, 
their interpretations are different, as discussed previously in Chapter 2. As a reminder, we 
may interpret the DFT process described in (8.85) as a system of N complex heterodyning 
and averaging operations, as shown in Figure 2.32a. In the picture depicted in this part of 
the figure, heterodyning refers to the multiplication of the data sequence x n by one of N 
complex exponentials, exp(-j27i£n/A0. As such, (8.85) may be viewed as the analysis 
equation. For the interpretation of (8.86), we may view it as the synthesis equation : 
specifically, the complex Fourier coefficient X k is weighted by one of N complex 
exponentials sxp(-j2nkn/N). At time n, the output x n is formed by summing the weighted 
complex Fourier coefficients, as shown in Figure 2.32b. 

An important property of a circulant matrix, exemplified by the channel matrix H of 
(8.83), is that it permits the spectral decomposition defined by 

H = Q AQ 

where the superscript f denotes Hermitian transposition (i.e., the combination of complex 
conjugation and ordinary matrix transposition). Descriptions of the matrices Q and A are 
presented in the following in that order. The matrix Q is a square matrix defined in terms 
of the kernel of the /V-point DFT as shown by 


exp 


— - exp 


exp 


i 

exp 


---- exp 

-j^2((V-2) 

exp 

~:^ N -2) 

i 


exp 


.... 

exp (- j f 2 ) 

exp (- j f) 

i 


i 

.... 

i 

i 

i 


From this definition, we readily see that the kith element of the N-hy-N matrix, Q, starting 
from the bottom right at k = 0 and / = 0 and counting up step-by-step, is 

q U = j^ ex p(-j kj = 0, 1, 1 

The matrix Q is an orthonormal matrix or unitary matrix, in the sense that it satisfies the 
condition 

QQ = I 

where I is the identity matrix. That is, the inverse matrix of Q is equal to the Hermitian 
transpose of Q. 


DMT System using Discrete Fourier Transform 


491 


The matrix A in (8.87) is a diagonal matrix that contains the N DFT values of the 
sequence h 0 , h^, h v that characterize the channel. Denoting these transform values by 

A N _ i, . .., Aq, respectively, we may express A as 

'V- 1 0 


0 0 

Note that A used here are not to be confused with the Lagrange multipliers in Section 8.13. 

From a system design objective, the DFT has established itself as one of the principal 
tools of digital signal processing by virtue of its efficient computation using the FFT 
algorithm , which was also described in Chapter 2. Computationally speaking, the FFT 
algorithm requires on the order of /Vlog 2 A / operations rather than the N 2 operations for 
direct computation of the DFT. For efficient implementation of the FFT algorithm, we 
should choose the block length N to be an integer power of 2. The computational savings 
obtained by using the FFT algorithm are made possible by exploiting the special structure 
of the DFT defined in (8.85). Moreover, these savings become more substantial as we 
increase the data length N. 


.... o 
.... o 


With this brief review of the DFT and its FFT implementations at hand, we are ready to 
resume our discussion of the DMT system. First, we define 

s = Q + S 

where S is the frequency-domain vector representation of the transmitter output. Each 
element of the /V-by- 1 vector S may be viewed as a complex- valued point in a two- 
dimensional QAM signal constellation. Given the channel output vector x, we define its 
corresponding frequency-domain representation as 

X = Qx 


Using (8.87), (8.92), and (8.93), we may rewrite (8.79) in the equivalent form 

X = Q(Q AQQ S + W) 


Hence, using the equality of (8.90) in (8.94) we may reduce the vector X to the simple 
form 


X = AS + W 


where 

W = Qw 

In expanded (scalar) form, the matrix equation (8.95) reads as follows: 




k = 0, 1, 1 
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N- 1 

where the set of frequency-domain values { _ () is known for a prescribed channel. 

Note that X k is a random variable and u’ k is a random variable sampled from a white 
Gaussian noise process. 

For a channel with additive white noise, (8.97) leads us to make the following 
important statement: 


With the At, being all known, we may thus use the block of frequency-domain values 

N- 1 

\X,} to compute estimates of the corresponding transmitted block of frequency- 
K k = u i 

domain values { 5^ } . 


Equations (8.95), (8.85), (8.86), and (8.97) provide the mathematical basis for the 
implementation of DMT using the DFT. Figure 8.30 illustrates the block diagram of the 
system derived from these equations, setting the stage for their practical roles: 


Binary Estimate of the 



Block diagram of the DFT-based DMT system. 
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The transmitter consists of the following functional blocks: 

• Demultiplexer, which converts the incoming serial data stream into parallel form. 

• Constellation encoder, which maps the parallel data into N/2 multibit 
subchannels with each subchannel being represented by a QAM signal 
constellation. Bit allocation among the subchannels is also performed here in 
accordance with a loading algorithm. 

• IDFT, which transforms the frequency-domain parallel data at the constellation 
encoder output into parallel time-domain data. For efficient implementation of 
the IDFT using the FFT algorithm, we need to choose N = 2 k , where k is a 
positive integer. 

• Parallel-to-serial converter, which converts the parallel time-domain data into 
serial form. Guard intervals stuffed with cyclic prefixes are inserted into the serial 
data on a periodic basis before conversion into analog form. 

• Digital-to-analog converter (DAC), which converts the digital data into analog 
form ready for transmission over the channel. 

Typically, the DAC includes a transmit filter. Accordingly, the time function h( t) in 

Figure 8.25 should be redefined as the combined impulse response of the cascade 

connection of the transmit filter and the channel. 

The receiver performs the inverse operations of the transmitter, as described here: 

• Analog-to-digital converter (ADC), which converts the analog channel output 
into digital form. 

• Serial-to-parallel converter, which converts the resulting bit stream into parallel 
form. Before this conversion takes place, the guard intervals (cyclic prefixes) are 
removed. 

• DFT, which transforms the time-domain parallel data into frequency-domain 
parallel data; as with the IDFT, the FFT algorithm is used to implement the DFT. 

• Decoder, which uses the DFT output to compute estimates of the original 
multibit subchannel data supplied to the transmitter. 

• Multiplexer, which combines the estimates so computed to produce a 
reconstruction of the transmitted serial data stream. 

To sum up: 


An important application of DMT is in the transmission of data over two-way channels. 
Indeed, DMT has been standardized for use on ADSLs using twisted wire -pairs. In ADSL, 
for example, the DMT provides for the transmission of data downstream (i.e., from an ISP 
to a subscriber) at the rate of 1.544 Mbits/s and the simultaneous transmission of data 
upstream (i.e., from the subscriber to the ISP) at 160 kbits/s. This kind of data 
transmission capability is well suited for handling data-intensive applications such as 
video-on-demand. 
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DMT is also a core technology in implementing the asymmetric VDSLs, which differs 
from all other DSL transmission techniques because of its ability to deliver extremely high 
data rates. For example, a VDSL can provide data rates of 13 to 26 Mbits/s downstream 
and 2 to 3 MB/s upstream over twisted wire-pairs that emanate from an optical network 
unit and connect to the subscriber over distances of less than about 1 km. These high data 
rates allow the delivery of digital TV, super-fast Web surfing and file transfer, and virtual 
offices at home. 

From a practical perspective, the use of DMT for implementing ADSL and VDSL 
provides a number of advantages: 

• The ability of DMT to maximize the transmitted bit rate, which is provided by 
tailoring the distribution of information-bearing signals across the channel 
according to channel attenuation and noise conditions. 

• Adaptivity to changing line conditions, which is realized by virtue of the fact that the 
channel is partitioned into a number of subchannels. 

• Reduced sensitivity to impulse noise, which is achieved by spreading its energy over 
the many subchannels of the receiver. As the name implies, impulse noise is 
characterized by long, quiet intervals followed by narrow pulses of randomly 
varying amplitude. In an ADSL or VDSL environment, impulse noise arises due to 
switching transients coupled to twisted wire-pairs in the central office and to various 
electrical devices on the user’s premises. 

• Effectively, employment of the DMT system eliminates the need for adaptive 
channel equalization. 

Summary and Discussion 


In this chapter devoted to data transmission over band-limited channels, two important 
aspects of this practical problem were discussed. 

In the first part of the chapter, we assumed that the SNR at the channel input is large 
enough for the effect of channel noise to be ignored. Under this assumption, the issue of 
dealing with intersymbol interference was viewed as a signal design problem. That is, the 
overall pulse shape pit) is configured in such a way that pit) is zero at the sampling times 
nT b , where 7 h is the reciprocal of the bit rate R b . In so doing, the intersymbol interference 
is reduced to zero. Finding the pulse shape that satisfies this requirement is best handled in 
the frequency domain. The ideal solution is a “brick-wall” spectrum that is constant over 
the interval -W </< W where W = 1/2 T b . T b is the bit duration and W is called the Nyquist 
bandwidth. Unfortunately, this ideal pulse shape is impractical on two accounts: noncausal 
behavior and sensitivity to timing errors. To overcome these two practical difficulties, we 
proposed the use of an RC spectrum that rolls off gradually from a constant value over a 
prescribed band toward zero in a half-cosine-like manner on either side of the band. We 
finished this first part of the chapter by introducing the SRRC spectrum, where the overall 
pulse shaping is split equally between the transmitter and receiver; this latter form of 
signal design finds application in wireless communication. 

Turning next to the second part of the chapter, we discussed another way of tackling 
data transmission over a wideband channel by applying the engineering principle of 
“divide and conquer.” Specifically, a telephone channel, using a twisted wire-pair, is 
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partitioned into a large number of narrowband subchannels, such that each noisy 
subchannel can be handled by applying Shannon’s information-capacity law. Then, 
through a series of clever mathematical steps, the treatment of a difficult “discrete 
multicarrier transmission system” is modified into a new “DMT system.” Most 
importantly, by exploiting the computational efficiency of the FFT algorithm, practical 
implementation of the DMT assumes a well-structured transceiver (i.e., pair of transmitter 
and receiver) that is effective in performance and efficient in computational terms. Indeed, 
the DMT has established itself as the standard core technology for designing the 
asymmetric and very high bit-rate members of the digital subscriber line family. 
Moreover, the world-wide deployment of DSL technology has converted an ordinary 
telephone line into a broadband communication link, so much so that we may now view 
the PSTN as a broadband backbone data network. Most importantly, this analog-to-digital 
network conversion has made it possible to transmit data at rates in the megabits per 
second region, which is a truly a remarkable engineering achievement. 


Problems 

Nyquist’s Criterion 

The NRZ pulse of Figure P8.1 may be viewed as a very crude form of a Nyquist pulse. Justify this 
statement by comparing the spectral characteristics of these two pulses. 



A binary PAM signal is to be transmitted over a baseband channel with an absolute maximum 
bandwidth of 75 kHz. The bit duration is 10 |is . Find an RC spectrum that satisfies these 
requirements. 

An analog signal is sampled, quantized, and encoded into a binary PCM. Specifications of the PCM 
signal include the following: 

Sampling rate, 8 kHz 

Number of representation levels, 64. 

The PCM signal is transmitted over a baseband channel using discrete PAM. Determine the 
minimum bandwidth required for transmitting the PCM signal if each pulse is allowed to take on the 
following number of amplitude levels: 2, 4, or 8. 

Consider a baseband binary PAM system that is designed to have an RC spectrum P(f). The 
resulting pulse p(t) is defined in (8.25). How would this pulse be modified if the system is designed 
to have a linear phase response? 

Determine the Nyquist pulse whose inverse Fourier transform is defined by the frequency function 
P(f) defined in (8.26). 
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Continuing with the defining condition in Problem 8.5, namely 

£ p (f + f) = T b T b >° 

n - — co b 

demonstrate that the Nyquist pulse p{t) with the narrowest bandwidth is described by the sine 
function: b 


p(t) = sincl 



A pulse p(t) is said to be orthogonal under T-shifts if it satisfies the condition 

r“ 

p(t)p(t - nT h ) dr = 0 for n = + 1,±2, ... 

-00 

where T b is the bit duration. In other words, the pulse p(t) is uncon'elated with itself when it is 
shifted by any integer multiple of T b . Show that this condition is satisfied by a Nyquist pulse. 

Let P(f) be an integrable function, the inverse Fourier transform of which is given by 

P(t) = J P(f) exp(j 2n:/f) d/ 

-00 

and let T b be given. The pulse p{t) so defined is a Nyquist pulse of bit duration T b if, and only if, the 
Fourier transform P(f) satisfies the condition 

£ p (/ + /) = 7 i 

n = -oo b 

Using the Poisson sum formula described in Chapter 2, demonstrate the validity of this statement. 

Let g(t) denote a function, the Fourier transform of which is denoted by Gif). The pulse g(t) is 
orthogonal under T-shifts in that its Fourier transform Gif) satisfies the condition 

2 

= constant 


G\f + : 

n = -oo 

Show that this condition is satisfied by the SRRC shaping pulse. 


Partial Response Signaling 


The sine pulse is the optimum Nyquist pulse, optimum in the sense it produces zero intersymbol 
interference occupying the minimum bandwidth possible W = 1 !2T b , where T b is the bit duration. 
However, as discussed in Section 8.5, the sine pulse is prone to timing errors; hence the preference 
for the RC spectrum that requires twice the minimum bandwidth, 2 W. 

In this problem, we explore a new pulse that achieves the minimum possible bandwidth W = 1/2 T b as 
the sine pulse, but at the expense of a deterministic (i.e., controlled) intersymbol interference; being 
controllable, appropriate measures can be taken at the receiver to account for it. 

This new pulse is denoted by gjl t ), the Fourier transform of which is denoted by 

^ f 2cos(7t/T b ) exp(-j Jt/Tb), |/|<(l/2T b ) 

Gi(f) = < 

[0, otherwise 


Plot the magnitude and phase spectrum of Gff). 
Show that the pulse gi(t) is defined by 


ffi(0 = 


tI Sin(7 It/T b ) 
nt(T b -t) 
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and therefore justify the statement that the tails of gj(f) decay as l/|r| 2 , which is faster than the 
rate of decay l/|f| that characterizes the sine pulse. Comment on this advantage of gj(f) over the 
sine pulse. 

Plot the waveform of gft) to demonstrate that gi(f) has only two distinguishable values at the 
sampling instants; hence the reference to gft) as a duobinary code. 

Signaling over a band-limited channel with the use of a duobinary code is referred to partial- 
response signaling. Explain why. 


In this problem, we explore another form of partial-response signaling based on the modified 
duobinary code. Let this second code be represented by the pulse g 2 (t) whose Fourier transform is 
defined by 


|2j sin(27t fT b ) exp(-j27t /T b ), 

to. 


[f\<l/2T b 

otherwise 


Plot the magnitude and phase spectra of G 2 (f). 

Show that the modified duobinary pulse is itself defined by 


S 2 ( 0 


2T b sin(7t t/T h ) 
nt(2 T b -t) 


and therefore demonstrate that it has three distinguishable levels at the sampling instants. 

What is a practical advantage of the modified duobinary code over the duobinary code in terms of 
transmission over a band-limited channel? 


Multichannel Line Codes 

Consider the passband basis functions defined in (8.56), where </>(t) is itself defined by (8.57). 
Demonstrate the validity of Properties 1, 2. and 3 of these passband basis functions. 

The water-filling solution for the loading problem is defined by (8.73) subject to the constraint of 
(8.70). Using this pair of relations, formulate a recursive algorithm for computing the allocation of 
the transmit power P among the N subchannels. The algorithm should start with an initial total or 
sum noise-to-signal ratio NSR/,) = 0 for iteration i = 0, and the subchannels sorted in terms of those 
with the smallest power allocation to the largest. 

The squared magnitude response of a linear channel, denoted by \H(f)\ 2 . is shown in Figure P8.14. 
Assuming that the gap T = 1 and the noise variance a n = 1 for all subchannels, do the following: 
Derive the formulas for the optimum powers P[, P 2 , and P 2 , allocated to the three subchannels of 
frequency bands (0, Wf), (W { , W 2 ), and (W 2 , W). 

Given that the total transmit power P = 10, f = 2/3, and l 2 = 1/3, calculate the corresponding 
values ofP h P 2 , and P 2 . 
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In this problem we explore the use of singular value decomposition (SVD) as an alternative to the 
DFT for vector coding. This approach avoids the need for a cyclic prefix, with the channel matrix 
being formulated as 


H 


h Q /?j li 2 — h v 0 — 0 
0 h Q h l -- h v _ x h v — 0 


0 0 0 — h 0 h j — h y 


where the sequence h 0 , hy, h v denotes the sampled impulse response of the channel. The SVD of 
the matrix H is defined by 

H = U[A:0 JVv ]Vt 

where U is an N-by-N unitary matrix and V is an ( N + v)-by-(N + v) unitary matrix; that is, 

UUt = I 
vvt = i 


where I is the identity matrix and the superscript t denotes Hermitian transposition. The A is an 
N-by-N diagonal matrix with singular values \,n = 1 , 2, . . . , N . The 0^, is an IV-by-v matrix of zeros. 
Using this decomposition, show that the N subchannels resulting from the use of vector coding 
are mathematically described by 

= K A n + K 


The X n is an element of the matrix product U^x , where x is the received signal (channel output) 
vector. A n is the nth symbol a n + j b n and W n is a random variable due to channel noise. 

Show that the SNR for vector coding as described herein is given by 


(SNR) 


vector coding 



-,l/(N + v) 


, ( SNR )„j 


-r 


where N* is the number of channels for each of which the allocated transmit power is 
nonnegative, (SNR) ; , is the SNR of subchannel n, and T is a prescribed gap. 

As the block length N approaches infinity, the singular values approach the magnitudes of the 
channel Fourier transform. Using this result, comment on the relationship between vector coding 
and discrete multitone. 


Computer Experiments 

In this computer-oriented problem, consisting of two parts, we demonstrate the effect of nonlinearity 
on eye patterns. 

Consider a 4-ary PAM system, operating under idealized conditions: no channel noise and no 1ST 
the specifications are as follows: 

Nyquist bandwidth, W = 0.5 Hz 
Roll-off factor, a = 0.5 

Symbol duration, T = 2 T b for M = 4 and T b is the bit duration. 

Compute the eye pattern for this noiseless PAM system. 

Repeat the computation, this time assuming that the channel is nonlinear with the following 
input-output relationship: 

x(t) = s(t) + as 2 (t) 
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where s(t) is the channel input and x(t) is the channel output (i.e., received signal); the a is a 
constant. Compute the eye pattern for the following three nonlinear conditions: 

a = 0.05,0.1,0.2 

Hence, discuss how varying the constant a affects the shape of the eye pattern for the 4-ary PAM 
system. 

Notes 


1. The criterion described in (8.11) or (8.15) was first formulated by Nyquist in the study of 
telegraph transmission theory; the Nyquist (1928b) paper is a classic. In the literature, this criterion 
is referred to as Nyquist’ s first criterion. In the 1928b paper. Nyquist described another method, 
referred to in the literature as Nyquist’ s second criterion. The second method makes use of the 
instants of transition between unlike symbols in the received signal rather than centered samples. A 
discussion of the first and second criteria is presented in Bennett (1970: 78-92) and in the paper by 
Gibby and Smith (1965). A third criterion attributed to Nyquist is discussed in Sunde (1969); see 
also the papers by Pasupathy (1974) and Sayar and Pasupathy (1987). 

2. The specifications described in Example 1 follow the book by Tranter et al. (2004). 

3. The SRRC pulse shaping is discussed in Chennakeshu and Saulnier (1993) in the context of jt/4- 
shifted differential QPSK for digital cellular radio. It is also discussed in Anderson (2005: 27-29). 

4. In a strict sense, an eye pattern that is completely open occupies the range from -1 to +1. On this 
basis, zero intersymbol interference would correspond to an ideal eye opening of 2. However, for 
two reasons, convenience of presentation and consistency with the literature, we have chosen an eye 
opening of unity to refer to the ideal condition of zero intersymbol interference. 

5. For a detailed treatment of decision feedback equalizers, see the fifth edition of the classic book 
on Digital Communications by Proakis and Salehi (2008). 

6. The idea of an ADSL is attributed to Lechleider (1989) in having had the insight that such an 
arrangement offers the possibility of more than doubling the information capacity of a symmetric 
arrangement. 

7. For a detailed discussion of VDSL, see Chapter 7 of the book by Starr et al. (2003); see also the 
paper by Cioffi et al. (1999). 

8. The method of Lagrange multipliers is discussed in Appendix D. 
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Introduction 


In Chapters 7 and 8 we studied signaling over AWGN and band-limited channels, respec- 
tively. In this chapter we go on to study a more complicated communications environment, 
namely a fading channel, which is at the very core of ever-expanding wireless communica- 
tions. Fading refers to the fact that even though the distance separating a mobile receiver 
from the transmitter is essentially constant, a relatively small movement of the receiver 
away from the transmitter could result in a significant change in the received power. The 
physical phenomenon responsible for fading is multipath, which means that the transmitted 
signal reaches the mobile receiver via multiple paths with varying spatio-temporal charac- 
teristics, hence the challenging nature of the wireless channel for reliable communication. 

This chapter consists of three related parts: 

First we study signaling over a fading channel by characterizing its statistical behavior 
in temporal as well as spacial terms. This statistical characterization is carried out from 
three different perspectives: physical, mathematical, and computational, each of which 
enriches our understanding of the multipath phenomenon in its own way. This first part of 
the chapter finishes with: 

• BER comparison of different modulation schemes for AWGN and Rayleigh fading 
channels. 

• Graphical display of how different fading channels compare to a corresponding 
AWGN channel using binary PSK. 

This evaluation then prompts the issue of how to combat the degrading effect of multipath 
and thereby realize reliable communication over a fading channel. Indeed, the second part 
of the chapter is devoted to this important practical issue. Specifically, we study the use of 
space diversity, which can be one of three kinds: 

Diversity-on-receive, which involves the use of a single transmitter and multiple 
receivers, with each receiver having its own antenna. 

Diversity -on-transmit, which involves the use of multiple transmitting antennas and 
a single receiver. 

Multiple-input, multiple-output (MIMO) antenna system, which includes diversity 
on receive and diversity on transmit in a combined manner. 

The use of diversity-on-receive techniques is of long standing in the study of radio 
communications. On the other hand, diversity-on-transmit and MIMO antenna systems are 
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of recent origin. The study of diversity is closely related to that of information capacity, 
the evaluation of which is also given special attention in the latter part of the chapter. 

For the third and final part of the chapter, we study spread-spectrum signals, which 
provide the basis of another novel way of thinking about how to mitigate the degrading 
effects of the multipath phenomenon. In more specific terms, the use of spread-spectrum 
signaling leads to the formulation of code-division multiple access, a topic that was 
covered briefly in the introductory Chapter 1 . 

Propagation Effects 


The major propagation problems encountered in the use of mobile radio in built-up areas 
are due to the fact that the antenna of a mobile unit may lie well below the surrounding 
buildings. Simply put, there is no “line-of-sight” path to the base station. Instead, radio 
propagation takes place mainly by way of scattering from the surfaces of the surrounding 
buildings and by diffraction over and/or around them, as illustrated in Figure 9.1. The 
important point to note from Figure 9.1 is that energy reaches the receiving antenna via 
more than one path. Accordingly, we speak of a multipath phenomenon, in that the various 
incoming radio waves reach their destination from different directions and with different 
time delays. 

To understand the nature of the multipath phenomenon, consider first a “static” 
multipath environment involving a stationary receiver and a transmitted signal that 
consists of a narrowband signal (e.g., unmodulated sinusoidal carrier). Let it be assumed 
that two attenuated versions of the transmitted signal arrive sequentially at the receiver. 
The effect of the differential time delay is to introduce a relative phase shift between any 
two components of the received signal. We may then identify one of two extreme cases 
that can arise: 

• The relative phase shift is zero, in which case the two components add 
constructively, as illustrated in Figure 9.2a. 

• The relative phase shift is 180°, in which case the two components add destructively , 
as illustrated in Figure 9.2b. 
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Illustrating the mechanism of radio 
propagation in urban areas. 
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(a) Constructive and (b) destructive forms of the multipath 
phenomenon for sinusoidal signals. 


We may also use phcisors to demonstrate the constructive and destructive effects of 
multipath, as shown in Figures 9.3a and 9.3b, respectively. Note that, in the static 
multipath environment described herein, the amplitude of the received signal does not vary 
with time. 

Consider next a “dynamic” multipath environment in which the receiver is in motion and 
two versions of the transmitted narrowband signal reach the receiver via paths of different 
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Phasor representations of (a) constructive and (b) destructive forms of multipath. 
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Illustrating how the envelope fades as 
two incoming signals combine with 
different phases. 



Distance 


lengths. Owing to motion of the receiver, there is a continuous change in the length of each 
propagation path. Hence, the relative phase shift between the two components of the 
received signal is a function of spatial location of the receiver. As the receiver moves, we 
now find that the received amplitude (envelope) is no longer constant, as was the case in a 
static environment; rather, it varies with distance, as illustrated in Figure 9.4. At the top of 
this figure, we have also included the phasor relationships for two components of the 
received signal at various locations of the receiver. Figure 9.4 shows that there is 
constructive addition at some locations and almost complete cancellation at some other 
locations. This physical phenomenon is referred to as fast fading. 

In a mobile radio environment encountered in practice, there may of course be a 
multitude of propagation paths with different lengths and their contributions to the received 
signal could combine in a variety of ways. The net result is that the envelope of the received 
signal varies with location in a complicated fashion, as shown by the experimental record of 
received signal envelope in an urban area that is presented in Figure 9.5. This figure clearly 
displays the fading nature of the received signal. The received signal envelope in Figure 9.5 
is measured in dBm. The unit dBm is defined as 10 log] 0 (P/ Pq), with P denoting the power 
being measured and P 0 = 1 mW as the frame of reference. In the case of Figure 9.5, P is the 
instantaneous power in the received signal envelope. 

Signal fading is essentially a spatial phenomenon that manifests itself in the time 
domain as the receiver moves. These variations can be related to the motion of the receiver 
as follows. Consider the situation illustrated in Figure 9.6, where the receiver is assumed 
to be moving along the line A A' with a constant velocity v . It is also assumed that the 
received signal is due to a radio wave from a scatterer labelled S. Let At denote the time 
taken for the receiver to move from point A to A'. Using the notation described in Figure 
9.6, the incremental change in the path length of the radio wave is deduced to be 

A l = cl cos if/ 

= -vA t cos if/ 
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Experimental record of received signal envelope in an urban area. 

where iji is the spatial angle subtended between the incoming radio wave and the direction 
of motion of the receiver. Correspondingly, the change in the phase angle of the received 
signal at point A' with respect to that at point A is given by 

A</> = 


The apparent change in frequency, or the Doppler shift , is 

2n At 

v . 

= - cos if 

The Doppler shift v is positive (resulting in an increase in frequency) when the radio 
waves arrive from ahead of the mobile unit and it is negative when the radio waves arrive 
from behind the mobile unit. 


where A is the radio wavelength, 
therefore defined by 


2 it , 

t a ' 

2nvAt , 
— cos if 


s 



Illustrating the calculation 
of Doppler shift. 


Direction 
of motion 


506 


Signaling over Fading Channels 


Jakes Model 


To illustrate fast fading due to a moving receiver, consider a dynamic multipath 
environment that involves N lid fixed scatterers surrounding such a receiver. Let the 
transmitted signal be the complex sinusoidal function of unit amplitude and frequency f c , 
as shown by 

s(t) = exp(j2;t/ c f) 

Then, the composite signal observed at the moving receiver, including relative effects of a 
Doppler shift, is given by 

N 

*o(0 = X A » exp U 2 ^C/c + v n ) t+ i e ^ 

n = 1 

where the amplitude A n is contributed by the 77th scatterer, v is the corresponding Doppler 
shift, and 0 n is some random phase. The complex envelope of the received signal is time 
varying, as shown by 

N 

Xq{ 1 ) = 2 ^„exp[j27tv n f + j<9J 

n = 1 

Correspondingly, the autocorrelation function of the complex envelope x 0 (f) is defined by 

R~ o (f) = E[x 0 *(f)* 0 O+ r)] 

where E is the expectation operator with respect to time t and the asterisk in x 0 *(f) 
denotes complex conjugation. Inserting (9.3) in (9.4) leads to a double summation, one 
indexed by n and the other indexed by m. Then, simplifying the result under the iid 
assumption, the autocorrelation function /C^(r ) reduces to 

N 2 

^ E[A n exp(j27tv„r)], if m = n 

n = 1 

0, if m 5* 77 

At this point in the discussion, we make two observations: 

The effects of small changes in distances between the moving receiver and the 77th 
scatterer are small enough for all n for us to write 

E[A^ exp(/'27t v n r)] = E[A2]E[exp(j27i v„r)] 

where 11 = 1,2, . . . , N. 

The Doppler shift v is proportional to the cosine of the angle subtended 
between the incoming radio wave from the 77th scatterer and the direction of motion 
of the receiver in Figure 9.6, which follows from (9.2). 

We may therefore write 



v n = V x C0S K’ 


n = 1,2, ...,IV 


Jakes Model 


507 


where v is the maximum Doppler shift that occurs when the incoming radio waves 
propogate in the same direction as the motion of the receiver. Accordingly, using (9.6) and 
(9.7) in (9.5), we may write 


R ~4 T) 


N 


P 0 X E [ eX P(j 2nV max rC0S W]’ 

n = 1 

o, 


for m = n 


for m -£■ n 


where the multiplying factor 

N 

P 0 = X K 

n = 1 

is the average signal power at the receiver input. 

We now make two final assumptions: 

All the radio waves arrive at the receiver from a horizontal direction (Clarke, 1968). 
The multipath is uniformly distributed over the range [- 71 , 7t] , as shown by the 
probability density function (Jakes, 1974): 


fy( <A) = 


< 2 71 ’ 

0, 


—7t < if/< n 
otherwise 


Under these two assumptions, the remaining expectation in (9.8) becomes independent of 
n and with it, that equation simplifies further as follows: 


R~ x (t) = P 0 E[exp(j27t v t cos if)f -n<if<n 


r K 

oj /y( '/') ex p(j 271 v max t cos i/s)<D/s 

r 71 

J exp(j 271 v max r cos i/S)dt/s 


-f 

2tx J 


The definite integral inside the brackets of this equation is recognized as the Bessel 
function of the first kind of order zero , see Appendix C. By definition, for some argument 
x, we have 


1 f 

J 0 (x) = — J exp(jx cos 9) &8 


We may therefore express the autocorrelation function of the complex signal x Q (t) at the 
input of the moving receiver in the compact form 


V r) = Vo( 27tl/ max r ) 

The model described by the autocorrelation function of (9.12) is called the Jakes model. 
Figure 9.7a shows a plot of the autocorrelation R~ ( r) according to this model. 
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(a) Autocorrelation of the complex envelope of 
the received signal according to the Jakes model. 

(b) Power spectrum of the fading process for the 
Jakes model. 



According to the Wiener-Khintchine relations for a weakly (wide-sense) stationary 
process (discussed in Chapter 4), the autocorrelaton function and power spectrum form a 
Fourier-transform pair. Specifically, we may write 

*yv) = F[Vo(2* W)] 

At first sight, it might seem that a closed form solution of this transformation is 
mathematically intractable; in reality, however, the exact solution is given in (Jakes, 1974): 




1 1 — ( v/ v 

V v max-' 


forv< v„ 


for v > v„ 
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and with it the model bears his name. Figure 9.7b plots the power spectrum in (9.14) 
versus the Doppler shift v for P () = 1 . This idealized graph has the shape of a “bathtub,” 
exhibiting two symmetric integrable singularities at the end points v = ± v m . 

Jakes Model Implemented as a FIR Filter 

The objective of this example is to compute a FIR (TDL) filter that models the power 
spectrum of (9.14). To this end, we make use of the following relationships in light of 
material covered in Chapter 4 on stochastic processes: 

The autocorrelation function and power spectrum of a weakly stationary process 
form a Fourier- transform pair, as already mentioned. 

In terms of stochastic processes, the input-output behavior of a linear system, in the 
frequency domain, is described by 


where H(f) is the transfer function of the system, S x (f) is the power spectrum of the 
input process X(t), and Sft) is the power spectrum of the output process Y(t), both 
being weakly stationary. 

If the input process X(t) is Gaussian, then the output process Y(t) is also Gaussian. 

If the input X(t) is uncorrelated, then the ouput Y(t) will be correlated due to 
dispersive behavior of the system. 

The issue at hand is to find the H(f) required to produce the desired power spectrum of 
(9.14) using a white noise process of spectral density Nq/2 as the input process X(t). Then, 
given the Syif) and setting the constant K = Nq/2, we may solve (9.15) for H(f), obtaining 


In other words, H(f) is proportional to the square root of S(f). (From a practical 
perspective, the constant K is determined by truncating the power-delay profile, an issue 
deferred to Section 9.14.) 

In light of (9.14) and (9.16), we may now say that the H(f) representing the desired 
Jakes FIR filter is given by (ignoring the constant K) 


where /' = v/vl, Given this formula, we may then use inverse Fourier transformation 

J max J 

to compute the corresponding impulse response of the Jakes FIR filter. 

However, before proceeding further, an important aspect of using Jakes model to 
simulate a fading channel is to pay particular attention to the following point: 


Syif) = \H(f)\ 2 S x (f) 




for -1 </< 1 


otherwise 


To be specific, the former is a multiple of the symbol rate and the latter is a multiple of the 
Doppler bandwidth, v max . In other words, the sampling rate is much larger than v m . It 
follows therefore that a multiple sampling rate with interpolation must be used in the 


510 


Signaling over Fading Channels 


simulation; the need for interpolation is to go from a discrete spectrum to its continuous 
version. 

With this point in mind, a 512-point inverse FFT algorithm is applied to the transfer 
function of (9.17) for the following set of specifications: 

maximum Doppler shift, v = 100 Hz 

sampling frequency, / s = 1 6 v max 

We thus obtain the discrete-time version of the truncated impulse response h n of the Jakes 
FIR filter plotted in Figure 9.8a. 

Having computed h n , we may go on to use the FFT algorithm to compute the 
corresponding transfer function H(j) of the Jakes FIR filter; the result of this computation 
is plotted in Figure 9.8b, which has a bathtub-like shape of its own, as expected. 
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Jakes FIR filter, fa) Discrete impulse response, (b) Interpolated power spectral 
density (PSD). 


Illustrative Generation of Fading Process Using the Jakes FIR Filter 

To expand the practical utility of the Jakes FIR filter computed in Example 1 to simulate 
the fading process, the next thing we do is to pass a complex white noise process through 
the filter, with the noise having uncorrelated samples. Figure 9.9a displays the power 
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spectrum of the resulting stochastic process at the filter output. Figure 9.9b shows the 
envelope of the output process, plotted on a logarithmic scale. This plot is typical of a 
fading correlated signal. 




Jakes FIR filter driven by white Gaussian noise, (a) Output power spectrum, (b) Envelope 
of the output process. 


Statistical Characterization of Wideband Wireless Channels 


Physical characterization of the multipath environment described in Section 9.3 is 
appropriate for narrowband mobile radio transmissions where the signal bandwidth is 
small compared with the reciprocal of the spread in propagation path delays. 

However, in real-life situations, we find that the signals radiated in a mobile radio 
environment occupy a wide bandwidth , such that statistical characterization of the wireless 
channel requires more detailed mathematical considerations, which is the objective of this 
section. To this end, we follow the complex notations described in Chapter 2 to simplify 
the analysis. 

To be specific, we may express the transmitted band-pass signal as follows: 
x{t) = Re[.i(f) exp(j27t/ c f)] 

where x(t ) is the complex (low-pass) envelope of x(t) and/ c is the carrier frequency. Since 
the channel is time varying due to multipath effects, the impulse response of the channel is 
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delay dependent and, therefore, a time-varying function. Let the impulse response of the 
channel be expressed as 

h(r,t) = Re[A(r ; t) exp(j2jt/ c r)] 

where h(r\t) is the complex low-pass impulse response of the channel and r is a delay 
variable. The complex low-pass impulse response h(z;t) is called the delay-spread 
function of the channel. Correspondingly, the complex low-pass envelope of the channel 
output, namely \(t ) , is defined by the convolution integral 

1 r 00 ~ 

y(t) = » h(T\t)x(t- f) dr 

^ —CO 

where the scaling factor 1/2 is the result of using complex notation; see Chapter 2 for 
details. To be generic, the x () ( t) in Section 9.2 has been changed to x(t). 

In general, the behavior of a mobile radio channel can be described only in statistical 
terms. For analytic purposes and mathematical tractability, the delay-spread function 
h( t ; f) is modeled as a zero-mean complex-valued Gaussian process. Then, at any time t 
the envelope \h( r ; t)\ is Rayleigh distributed and the channel is therefore referred to as a 
Rayleigh fading channel. When, however, the mobile radio environment includes fixed 
scatterers, we are no longer justified in using a zero-mean model to describe the delay- 
spread function h(z',t). In such a case, it is more appropriate to use a Rician distribution 
to describe the envelope \h( r ; f) and the channel is referred to as a. Rician fading channel. 
The Rayleigh and Rician distributions for a real-valued stochastic process were considered 
in Chapter 3. In the discussion presented in this chapter we focus largely, but not 
completely, on a Rayleigh fading channel. 


The time-varying transfer function of the channel is defined as the Fourier transform of the 
delay-spread function h(r\t) with respect to the delay variable r, as shown by 


H(ff) = f h( 

—00 


r ; t ) exp ( -j 2 nfr) dr 


where/denotes the frequency variable. The time-varying transfer function H(f;t) may be 
viewed as a frequency transmission characteristic of the channel. 

For a mathematically tractable statistical characterization of the channel, we make two 
assumptions motivated by physical considerations; hence the practical importance of the 
model resulting from these two assumptions. 


Wide-Sense Stationarity 


As explained in Chapter 4, a stochastic process is said to be wide-sense (i.e., weakly) 
stationary if its mean is time independent and its autocorrelation function is dependent 
only on the difference between two time instants at which the process is observed. In what 
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follows we use the “wide-sense stationary” terminology because of its common use in the 
wireless literature. 

In the context of the discussion presented herein, this first assumption means that 

• The expectation of h(r\t) with respect to time t is dependent only on the delay t. 

• Insofar as time t is concerned, the expectation of the product h*( r, ;f | ) x hi t 2 , t 2 ) 
is dependent only on the time difference At = t 2 ~t v 

Because Fourier transformation is a linear operation , it follows that if the complex delay- 
spread function h(r;t) is a zero-mean Gaussian wide-sense stationary process, then the 
complex time-varying transfer function H(f;t ) has similar statistics. 

Uncorrelated Scattering 


In other words, the second-order expectation with respect to time t satisfies the 
requirement 

;fj )/z( r 2 ;t 2 )] = E[/j*(r 1 ;r 1 )ft(r ] ;r 2 )] S(r l - r 2 ) 

where 8{r^~ r 2 ) is a Dirac-delta function defined in the delay domain. That is, the 
autocorrelation function of /i( r; f) is nonzero only when r 9 * Ty 

In the literature on statistical characterization of wireless channels, wide-sense 
stationarity is abbreviated as WSS and uncorrelated scattering is abbreviated as US. Thus, 
when both Assumptions 1 and 2 are satisfied simultaneously, the resulting channel model 
is said to be the WSSUS model. 

Consider then the correlation function of the delay-spread function h(r;t). Since 
h( r ; t) is complex valued, we use the following definition for the correlation function: 

Rj( r,,f |; r 2 , t 2 ) = E[7x*(r 1 ;f 1 )^(r 2 ;f 2 )] 

where E is the statistical expectation operator, the asterisk denotes complex conjugation, T\ 
and t 2 are propagation delays of the two paths involved in the calculation, and t\ and t 2 are 
the times at which the outputs of the two paths are observed. Under the combined WSSUS 
channel model, we may reformulate the correlation function in (9.22) as shown by 

Rj( Ty z 2 ,At) = E[/j*(r 1 ;r)7(r 9 ;f + At)] 

= r-(r, ■,At)8(T ] - r 2 ) 

where At is the difference between the observation times t\ and t 2 and Si T\-t 2 ) is the delta 
function in the r -domain. Thus, using r in place of zq for mathematical convenience, the 
function in the second line of (9.23) is redefined as 

r-(r;Af) = E[7*(r;f)7i( r,t + At)] 

The function r~{T\At) is called the multipath correlation profile of the channel. This new 
correlation function rj{v,At) provides a statistical measure of the extent to which the 
signal is distorted in the time domain as a result of transmission through the channel. 
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Consider next statistical characterization of the channel in terms of the complex time- 
varying transfer function Following a formulation similar to that described in 

(9.22), the correlation function of //(/; t) is defined by 

R H ( fv t I’fl’W = 


where/] and f 2 represent two frequencies in the spectrum of the transmitted signal. The 
correlation function R^(f\, t\ ;/ 2 , f 2 ) provides a statistical measure of the extent to which 
the signal is distorted in the frequency-domain by transmission through the channel. From 
(9.21), (9.22), and (9.25), it is apparent that the correlation functions t\ ;/ 2 , tf) and 

7?^(f], fjjzv,, t 2 ) form a two-dimensional Fourier-transform pair, defined as follows: 


CO 00 

R jj(fv t vfl’ f 2)^J J R ~ h ( T v t v r 2’ t 2 )ew[-}2n(f 1 T l -f 2 T 2 )~\dT l dT 2 


Invoking wide-sense stationarity in the time domain, we may reformulate (9.25) as 


R~ H (f v f 2 At) = E[i/*(/] \t)H(f 2 ,t + Af)] 


Equation (9.27) suggests that the correlation function R^(f\,f 2 At) may be measured by 
using pairs of spaced tones to carry out cross-correlation measurements on the resulting 
channel outputs. Such a measurement presumes stationarity in the time domain. If we also 
assume stationarity in the frequency domain, we may go one step further and write 


Rfjifif + A/;At) = r k {Af-At) 

= E[H*(f-,t)H(f+Af-,t + At)] 


The new correlation function r^( A/;Af) , introduced in the first line of (9.28), is in fact the 
Fourier transform of the multipath correlation profile r~(r;Ar) with respect to the delay- 
time variable r, as shown by 


r h (Af,At) = 



r~(r;Af) exp(-j2ji rAf) dr 


The new function r-^ Af'.At) is called the spaced-frequency, spaced-time correlation 
function of the channel, where the double use of “spaced” accounts for At and Af. 


Finally, we introduce another new function denoted by 5’(r;r / ) that forms a Fourier- 
transform pair with the multipath correlation profile /-~(r;At) with respect to the variable 
Af; that is, by definition, we have 

S( r ; v) = I r~(r;Af) exp(-j27iv / Af)d(Af) 

—00 

for the Fourier transform and 

r 00 

r~(r;Af) = I S( r,v) exp(j2?i vAt) dr 

—00 

for the inverse Fourier transform. 
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The function S( r; v) may also be defined in terms of r f - / (Af\At) by applying a form of 
double Fourier transformation'. 


That is to say, 
S(t;v) 



rf/Af'M) exp(-j2it vAt) exp(j2n rAf) d(A t) d(A/) 


Figure 9.10 displays the functional relationships between the three important functions: 
r~(r;Af), r~(A/;Af) , and Si r; v) in terms of the Fourier transform and its inverse. 

The function S(z;v) is called the scattering function of the channel. For a physical 
interpretation of it, consider the transmission of a single tone of frequency f relative to 
the carrier. The complex envelope of the resulting filter output is 


y(t) = exp(j2ji/'r)Ff(/';0 
The correlation function of y(t ) is given by 

E[y*(0R' + Af)] = exp(j2nf' At)E[H*(f' ;t)H*(f' ;t + At)] 
= exp(j2ji/'Af)r^(0;A?) 


where, in the last line, we made use of (9.28). Putting A/ = 0 in (9.29) and then using 
(9.31), we may write 


r h (0;At) 



r~(r\At) dr 
h 


- f [f W 


dr 


exp (j 2 7i vAt) dv 


Hence, we may view the integral inside the square brackets in (9.35), namely 

J S(r,v)dr 


Spaced-frequency 

Spaced-time 

F t H 

Multipath 

autocorrelation 

profile 

rr(T;Af) 

h 

f a ,H 

Scattering 



function 

A I) 

f a }h 

Fy' [’] 

S( T;v) 


F t [•] : Fourier transform with respect to delay t 
Fa/[*] ; Inverse Fourier transform with respect to frequency increment A / 

F Af [•]: Fourier transform with respect to time increment At 
Fy 1 [•]: Inverse Fourier transform with respect to Doppler shift v 

Functional relationships between the multipath correlation profile r~(r;Ar) , the 
spaced-frequency spaced-time correlation function and the scattering function S(t; v). 


516 


Signaling over Fading Channels 


as the power spectral density of the channel output relative to the frequency f of the 
transmitted tone with the Doppler shift v acting as the frequency variable. Generalizing 
this result, we may now make the statement: 


We continue statistical characterization of the wireless channel by putting At = 0 in (9.24) 
to obtain 

P~(r) = r~(r;0) 

= E[|^(r; r)| 2 ] 

The function Pj{t) describes the intensity (averaged over the fading fluctuations) of the 
scattering process at propagation delay r for the WSSUS channel. Accordingly, Pj{t) is 
called the power-delay profde of the channel. In any event, this profile provides an estimate 
of the average multipath power expressed as a function of the delay variable r . 

The power-delay profile may also be defined in terms of the scattering function S(t;v) 
by averaging it over all potentially possible Doppler shifts. Specifically, setting At = 0 in 
(9.31) and then using the first line of (9.36), we obtain 



S( t; v) dv 


Figure 9.1 1 shows an example of the power-delay profile that depicts a typical plot of the 
power spectral density versus excess delay; the excess delay is measured with respect to 
the time delay for the shortest echo path. The “threshold level” K included in Figure 9.1 1 
defines the power level below which the receiver fails to operate satisfactorily. 



Example of a power-delay profile for a mobile radio channel. 
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Central Moments of P~ (r) 

To characterize the power-delay profile of a WSSUS channel in statistical terms, we begin 
with the moment of order zero; that is, the integrated power averaged over the delay 
variable r , as shown by 

P av = J P ' h ( T ) dT 

—CO 


The average delay , normalized with respect to F av , is defined in terms of the first-order 
moment by the formula 



tP~(t) dr 
h 


Correspondingly, the second-order central moment, normalized with respect to P av , is 
defined by the root-mean-square (RMS) formula 


o 


T 


w-J (r ~ dr 

- av —oo 


The new parameter cr T is called the delay spread, which has acquired a special stature 
among the parameters used to characterize the WSSUS channel. 

From Chapter 2 on the representation of signals in a linear environment, we recall that 
the duration of a signal in the time domain is inversely related to the bandwidth of the 
signal in the frequency domain. Building on this time-frequency relationship, we may 
define the coherence bandwidth B coherence of a WSSUS channel as follows: 


In words: 


This statement is intuitively satisfying. 


Consider next the issue of relating Doppler effects to time variations of the channel. In 
direct contrast to the power-delay profile, this time we set A f= 0, which corresponds to the 
transmission of a single tone (of some appropriate frequency) over the channel. Under this 
condition, the spaced-frequency, spaced-time correlation function of the channel, 
described in (9.29), reduces to r- (0;Af). Hence, evaluating the Fourier transform of this 
function with respect to the time variable At, we may write 



r^(0;At) exp(-j2n v'Ar) d(Af) 


The function S - ( v) defines the power spectrum of the channel output expressed as a 
H 

function of the Doppler shift v; it is therefore called the Doppler power spectrum of the 
channel. 
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The Doppler-power spectrum of (9.42) may be interpreted in two insightful ways 
(Molisch, 2011): 

The Doppler spectrum describes the frequency dispersion of a wireless channel, 
which results in the occurrence of transmission errors in narrowband mobile 
wireless communication systems. 

The Doppler spectrum provides a measure of temporal variability of the channel, 
which, in mathematical terms, is described by the channel’s correlation function 
r^(0;Af) for A/ = 0. 

As such, we may view the Doppler-power spectrum as another important statistical 
characterization of WSSUS channels. 

The Doppler power spectrum may also be defined in terms of the scattering function by 
averaging it over all possible propagation delays, as shown by 


f °° 

—GO 


S{r\v)dT 


Typically, the Doppler shift v assumes positive and negative values with almost equal 
likelihood. The mean Doppler shift is therefore effectively zero. The square root of the 
second moment of the Doppler spectrum is thus defined by 


J v S~ H (v)dv 

—00 

J S H^ dv 


\ 1/2 


The parameter cr,, provides a measure of the width of the Doppler spectrum; therefore, it is 
called the Doppler spread of the channel. 

Another useful parameter that is often used in radio propagation measurements is the 
fade rate of the channel. For a Rayleigh fading channel, the average fade rate is related to 
the Doppler spread <r,,by the empirical rule: 

/fade rate = 1 -475 CT,, crossings per second 


As the name implies, the fade rate provides a measure of the rapidity of the channel fading 
phenomenon. 

Some typical values encountered in a mobile radio environment are as follows: 

• the delay spread cr T amounts to about 20 us; 

• the Doppler spread cr v due to the motion of a vehicle may typically occupy the range 
40-100 Hz, but sometimes may well exceed 100 Hz. 

One other parameter directly related to the Doppler spread is the coherence time of the 
channel. Here again, as with coherence bandwidth discussed previously, we may invoke 
the inverse time-frequency relationship to say that the coherence time of a multipath 
wireless channel is inversely proportional to the Doppler spread, as shown by 

= _ 1 _ 

^coherence _ 

a v 

0.3 
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where v max is the maximum Doppler shift due to motion of the mobile unit. In words: 


Here again, this statement is intuitively satisfying. 


The particular form of fading experienced by a multipath channel depends on whether the 
channel characterization is viewed in the frequency domain or the time domain: 

When the channel is viewed in the frequency domain, the parameter of concern is 
the channel’s coherence bandwidth B co herence> which is a measure of the 
transmission bandwidth for which signal distortion across the channel becomes 
noticeable. A multipath channel is said to be frequency selective if the coherence 
bandwidth of the channel is small compared with the bandwidth of the transmitted 
signal. In such a situation, the channel has a filtering effect, in that two sinusoidal 
components with a frequency separation greater than the channel’s coherence 
bandwidth are treated differently. If, however, the coherence bandwidth of the 
channel is large compared with the transmitted signal bandwidth, the fading is said 
to be frequency nonselective, or frequency flat. 

When the channel is viewed in the time domain, the parameter of concern is the 
coherence time r co herence> w hich provides a measure of the transmitted signal 
duration for which distortion across the channel becomes noticeable. The fading is 
said to be time selective if the coherence time of the channel is small compared with 
the duration of the received signal (i.e., the time for which the signal is in flight). For 



Illustrating the four classes of multipath channels: 
T c = coherence time, B c = coherence bandwidth. 


0 


Time duration 
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digital transmission, the received signal’s duration is taken as the symbol duration 
plus the channel’s delay spread. If, however, the channel’s coherence time is large 
compared with the received signal duration, then the fading is said to be time 
nonselective, or time flat , in the sense that the channel appears to the transmitted 
signal as time invariant. 

In light of this discussion, we may classify multipath channels as follows: 

• Flat-flat channel , which is flat in both frequency and time. 

• Frequency-flat channel , which is flat in frequency only. 

• Time-flat channel, which is flat in time only. 

• Completely non-flat channel, which is flat neither in frequency nor in time; such a 
channel is also referred to as a doubly spread channel. 

The classification of multipath channels, based on this approach, is shown in Figure 9.12. 

The forbidden area, shown shaded in this figure, follows from the inverse relationship that 

exists between bandwidth and time duration. 


FIR Modeling of Doubly Spread Channels 


In Section 9.4, statistical analysis of the doubly spread channel was carried out by focusing 
on two complex low-pass entities, namely the impulse response h(r;t) and the correspond- 
ing transfer function . Therein, mathematical simplification was accomplished by 

disposing of the midband frequency / c of the actual band-pass character of the doubly 
spread channel. Despite this simplification, the analytic approach used in Section 9.4 is 
highly demanding in mathematical terms. In this section, we will take an “approximate” 
approach based on the use of a FIR filter to model the doubly spread channel. From an 
engineering perspective, this new approach has a great deal of practical merit. 

To begin, we use the convolution integral to describe the input-output relationship of 
the system, as shown in (9.20), reproduced here for convenience of presentation 

1 r 00 ~ 

y(0 = -I h(T-,t)x(t- t) dr 

^ —00 

where x(t) is the complex low-pass input signal applied to the channel and y( t) is the 
resulting complex low-pass output signal. Although this integral can be formulated in 
another equivalent way, the choice made in (9.47) befits modeling of a time-varying FIR 
system, as we will see momentarily. Speaking of the input signal x(t) , we assume that its 
Fourier transform satisfies the condition 

X(f) = 0 for/> W 

where 2 W denotes the original input band-pass signal’s bandwidth centered around the 
midband frequency f c . With FIR filtering in mind, it is logical to expand the delayed input 
signal x(t - r) using the sampling theorem , discussed in Chapter 6. Specifically, we write 

00 f \ 

x( t - t) = ^ x(t - nT) sinc(^ nj 

n = —oo s 
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where T s is the sampling period of the FIR filter chosen in accordance with the sampling 
theorem as follows: 

jr>2 W 

1 s 

The sine function in (9.49) is defined by 



From the standpoint of the sampling theorem we could set l/T s = 2W, but the choice made 
in (9.50) gives us more practical flexibility. 

In (9.49) it is important to note that we have done the following: 

• Dependence on the coordinate functions under the summation has been put on the 
delay variable r in the sine function. 

• Dependence on the time-varying FIR coefficients has been put on time t. 

This separation of variables is the key to the FIR modeling of a linear time-varying 
system. Note also that the sine functions under the summation in (9.49) are orthogonal but 
not normalized. 

Thus, substituting (9.49) into (9.47) and interchanging the order of integration and 
summation, which is permitted as we are dealing with a linear system, we get 


y(0 


I I 


h(z\f) sinc(^— — nj dr 


To simplify matters, we now introduce the complex tap-coefficients c (t ) , defined in 
terms of the complex impulse response as follows: 

c n (f) = \ f £(r;0sincQL-/i) dr 
Z -00 * S 

Accordingly, we may rewrite (9.52) in the much simplified summation form: 

00 / N 

y(0 = X 

n — m 15 


Examining (9.54) for insight, we may make our first observation: 


Turning next to (9.53) for insight, refer to Figure 9.13, where this equation is sketched for 
three different settings of the function sinc[(t/7’ ) - n ] ; the area shaded in the figure 
refers to the complex impulse response h(r;t) that is assumed to be causal and occupying 
a finite duration. In light of the three different sketches shown in Figure 9.13, we may 
make our second observation. 
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/i(r; r) 



(a) n = 0 




Illustrating the way in which location of the sine 
weighting function shows up for varying n. 


In accordance with these two observations, we may approximate (9.54) as follows: 

K?)~ £ ~ x {j-n)c n (t) 

n = 0 s 


where K is the number of taps. 

Equation (9.55) defines a complex FIR model for the representation of a complex low-pass 
time- varying system characterized by the complex impulse response h(r\T). Figure 9.14 
depicts a block diagram representation of this model, based on (9.55). 
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Complex FIR model of a complex low-pass time- varying channel. 


To model the doubly spread channel by means of a FIR filter in accordance with (9.55), 
we need to know the sampling rate l/T s and the number of taps K in this equation. To 
satisfy these two practical requirements, we offer the following empirical points: 

The sampling rate of the FIR filter, l/T s , is much higher than the maximum Doppler 
bandwidth of the channel, v max ; typically, we find that 1 /T s is eight to sixteen times 
v max . Hence, knowing v max , we may determine a desirable value of the sampling 
rate 1 /T s . 

The number of taps K in (9.55) may be determined by truncating the power-delay 
profile P~ (/) of the channel. Specifically, given a measurement of this profile, a 
suitable value of K is determined by choosing a threshold level below which the 
receiver fails to operate satisfactorily, as illustrated in Figure 9.11. 


To generate the tap-coefficients c ;l (f), we may use the scheme shown in Figure 9.15 that 
involves the following (Jeruchim et al., 2000): 

A complex white Gaussian process of zero mean and unit variance is used as the 
input. 

A complex low-pass filter of transfer function H(f) is chosen in such a way that it 
produces the desired Doppler power spectrum S^(f) where we have used /in place 
of the Doppler shift v for convenience of presentation. In other words, we may set 


Scheme for generating the 
nth complex weighting 
coefficients c n (t) in the 
FIR model of Figure 9. 14. 


Complex 
Gaussian 
white noise 
process 

Complex low-pass 
linear filter 

Gain Tap input 

cr„ x(t — nf) 



Time-varying 

complex 

tap-coefficient 

cJt) 
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S~ c (f) = s~ H (f) 

= 5 - (f)\H(f )\ 2 

= \kff 

where, in the second line, S~,(f ) denotes the power spectral density of the white 
noise process, which is equal to unity by assumption. 

The filter is designed in such a way that its output g(t) has a normalized power of 
unity. 

The static gain, denoted by a , accounts for different variances of the different 
tap-coefficients. 

Rayleigh Processes 

For complex FIR modeling of a time-varying Rayleigh fading channel, we may use zero- 
mean complex Gaussian processes to represent the time-varying tap-coefficients eft ) , 
which, in turn, means that the complex impulse response of the channel h(r\t) is also a 
zero-mean Gaussian process in the variable t. 

Moreover, under the assumption of a WSSUS channel , the tap-coefficients c n (t) for 
varying n will be uncorrelated. The power spectral density of each tap-coefficient is 
specified by the Doppler spectrum. In particular, the variance cr~ of the nth weight 
function is approximately given by 

E[|c n (f)| 2 ] ~ T^p(nz) 

where 7’ s is the sampling period of the FIR and p(n r) is a discrete version of the power- 
delay profile, 7 > ~(r). 


Rician-Jakes Doppler Spectrum Model 


The Jakes model, discussed in Example 1, is well suited for describing the Doppler 
spectrum for a dense-scattering environment, exemplified by an urban area. However, in a 
rural environment, there is a high likelihood for the presence of one strong “direct line-of- 
sight” path, for which the FIR-based Rician model is an appropriate candidate. In such an 
environment, we may use the Rician-Jakes Doppler spectrum that has the following form 
(Tranter et al., 2004): 


S- C (f) = 


0.41 


' -(// V,x) 


+ 0.91 S(f± 0.7 v) 


where v max is the maximum magnitude of the Doppler shift. This partially empirical for- 
mula, plotted in Figure 9.16, consists of two components: the FIR Jakes filter of Example 1, 
and two delta functions at ±0.7 v max representing a direct-line-of sight signal received. 

Typically, the sequence defined by p(nTf decreases with n in an approximate 
exponential manner, eventually reaching a neglibly small value at some time 7 max . This 
exponential approximation of the power-delay profile has been validated experimentally 
by many measurements; see Note 4. In any event, the number of taps in the FIR filter, K, is 
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Illustrating the Rician-Jakes 
Doppler spectrum of (9.58). 



approximately defined by the ratio T /T . The point made here on the number of taps 
K substantiates what has been made previously on Jakes model in Example 1 and in point 
2 under Some Practical Matters in this section. 


Comparison of Modulation Schemes: Effects of Flat Fading 


We bring this first part of the chapter to an end by presenting the effects of flat fading on 
the behavior of different modulation schemes for wireless communications. 

In Chapter 7 we studied the subject of signaling over AWGN channels using different 
modulation schemes and evaluated their performance under two different receiver condi- 
tions: coherence and noncoherence. For the purpose of comparison, we have reproduced the 
BER for a selected number of those modulation schemes in AWGN in Table 9. 1 . 

Formulas for the BER of coherent and noncoherent digital receivers 


(a) Binary PSK, QPSK, MSK 
using coherent detection 




(b) Binary FSK 

using coherent detection 




(c) Binary DPSK 



1 

2 ( 1 + r 0 ) 


(d) Binary FSK 

using noncoherent detection 



1 

2 + r 0 


E^\ transmitted energy per bit; Nq/ 2: power spectral density of channel noise; 
/o' mean value of the received energy per bit-to-noise spectral density ratio. 
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Table 9.1 also includes the exact formulas for the BER for a flat Rayleigh fading 
channel, where the parameter 


is the mean value of the received signal energy per bit-to-noise spectral density ratio. In 
(9.59), the expectation E [a 2 ] is the mean value of the Rayleigh-distributed random 
variable a characterizing the channel. The derivations of the fading-channel formulas 
listed in the last column of Table 9.1 are addressed in Problems 9.1 and 9.2. 

Comparing the formulas for a flat Rayleigh fading channel with the formulas for their 
AWGN (i.e., nonfading) channel counterparts, we find that the Rayleigh fading process 
results in a severe degradation in the noise performance of a wireless communication 
receiver with the degradation measured in terms of decibels of additional mean SNR 
spectral density ratio. In particular, the asymptotic decrease in the BER with ;/ () follows an 
inverse law. This form of asymptotic behavior is dramatically different from the case of a 
nonfading channel, for which the asymptotic decrease in the BER with y () follows an 
exponential law. 

In graphical terms, Figure 9.17 plots the formulas under part a of Table 9.1 compared 
with the BERs of binary PSK over the AWGN and Rayleigh fading channels. The figure 
also includes corresponding plots for the Rician fading channel with different values of the 
Rice factor K. discussed in Chapter 4. We see that as K increases from zero to infinity, the 
behavior of the receiver varies all the way from the Rayleigh channel to the AWGN 
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Comparison of performance of coherently detected binary PSK over 
different fading channels. 
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channel. The results plotted in Figure 9.17 for the Rician channel were obtained using 
simulations (Haykin and Moher, 2005). From Figure 9.17 we see that, as matters stand, we 
have a serious problem caused by channel fading. For example, at an SNR of 20 dB and 
the presence of Rayleigh fading, the use of binary PSK results in a BER of about 3x 1 0 2 , 
which is not good enough for the transmission of speech or digital data over the wireless 
channel. 

Diversity Techniques 


Up to now, we have emphasized the multipath fading phenomenon as an inherent 
characteristic of a wireless channel, which indeed it is. Given this physical reality, how, 
then, do we make the communication process across the wireless channel into a reliable 
operation? The answer to this fundamental question lies in the use of diversity, which may 
be viewed as a form of redundancy in a spatial context. In particular, if several replicas of 
the information-bearing signal can be transmitted simultaneously over independently 
fading channels, then there is a good likelihood that at least one of the received signals will 
not be severely degraded by channel fading. There are several methods for making such a 
provision. In the context of the material covered in this book, we identify three approaches 
to diversity: 

Frequency diversity, in which the information-bearing signal is transmitted using 
several carriers that are spaced sufficiently apart from each other to provide 
independently fading versions of the signal. This may be accomplished by choosing 
a frequency spacing equal to or larger than the coherence bandwidth of the channel. 
Time diversity, in which the same information-bearing signal is transmitted in 
different time slots, with the interval between successive time slots being equal to or 
greater than the coherence time of the channel. We can still get some diversity if the 
interval is less than the coherence time of the channel, but at the expense of 
degraded performance. In any event, time diversity may be likened to the use of a 
repetition code for error-control coding. 

Space diversity, in which multiple transmit or receive antennas, or both, are used 
with the spacing between adjacent antennas being chosen so as to ensure the 
independence of possible fading events occurring in the channel. 

Among these three kinds of diversity, space diversity is the subject of interest in the 
second part of this chapter. Depending on which end of the wireless link is equipped with 
multiple antennas, we may identify three different forms of space diversity: 

Receive diversity, which involves the use of a single transmit antenna and multiple 
receive antennas. 

Transmit diversity, which involves the use of multiple transmit antennas and a single 
receive antenna. 

Diversity on both transmit and receive, which combines the use of multiple antennas 
at both the transmitter and receiver. 

Receive diversity is the oldest one of the three, with the other two being of more recent 
origin. In what follows, we will study these three different forms of diversity in this order. 
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“Space Diversity-on-Receive” Systems 


In “space diversity on receive,” multiple receiving antennas are used with the spacing 
between adjacent antennas being chosen so that their respective outputs are essentially 
independent of each other. This requirement may be satisfied by spacing the adjacent 
receiving antennas by as much as 10 to 20 radio wavelengths or less apart from each other. 
Typically, an elemental spacing of several radio wavelengths is deemed to be adequate for 
space diversity on receive. The much larger spacing is needed for elevated base stations, 
for which the angle spread of the incoming radio waves is small; note that the spatial 
coherence distance is inversely proportional to the angle spread. Through the use of 
diversity on receive as described here, we create a corresponding set of fading channels 
that are essentially independent. The issue then becomes that of combining the outputs of 
these statistically independent fading channels in accordance with a criterion that will 
provide improved receiver performance. In this section, we describe three different 
diversity-combining systems that do share a common feature: they all involve the use of 
linear receivers; hence the relative ease of their mathematical tractability. 


The block diagram of Figure 9.18 depicts a diversity-combining structure that consists of 
two functional blocks: /V r linear receivers and a logic circuit. This diversity system is said 
to be of a selection combining kind, in that given the ;V r receiver outputs produced by a 
common transmitted signal, the logic circuit selects the particular receiver output with the 
largest SNR as the received signal. In conceptual terms, selection combining is the 
simplest form of space-diversity-on-receive system. 

To describe the benefit of selection combining in statistical terms, we assume that the 
wireless communication channel is described by a frequency -flat, slowly fading Rayleigh 
channel. The implications of this assumption are threefold: 

The frequency-flat assumption means that all the frequency components constituting 
the transmitted signal experience the same random attenuation and phase shift. 

The slow-fading assumption means that fading remains essentially unchanged 
during the transmission of each symbol. 

The fading phenomenon is described by the Rayleigh distribution. 

Let s(t) denote the complex envelope of the modulated signal transmitted during the 
symbol interval 0 < t <T. Then, in light of the assumed channel, the complex envelope of 
the received signal of the Ath diversity branch is defined by 

x k {t) = a k exp (j 6 k )~s(t) + w k (t), 0 <t<T 

k = 1,2, ...,N r 

where, for the Ath diversity branch, the fading is represented by the multiplicative term 
a k exp(j 6 if) and the additive channel noise is denoted by w k (t) . With the fading assumed 
to be slowly varying relative to the symbol duration T, we should be able to estimate and then 
remove the unknown phase shift 9 k at each diversity branch with sufficient accuracy, in 
which case (9.60) simplifies to 

x k (t)~ a k s(t) + w k (t), 0 <t<T 

k = 1,2, 
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Block diagram of selection combiner, using N r receive antennas. 


The signal component of xAt) is a k s(t) and the noise component is w /it). The average 
SNR at the output of the kth receiver is therefore 


(SNR)* 


E[|<y(0| 2 ] 

E[Mr)| 2 ] 


' Eri?(f)i 2 i ) 

v E[|vt' Jfc (?)| 2 ] J 


E[« 2 ], 


k = 1,2, 


Ordinarily, the mean-square value of wJ t) is the same for all k. Accordingly, we may 
express the (SNR)/, as 


(SNR)* = jj-E[a k \, k = 1,2, ...,A r 


where E is the symbol energy and Nq / 2 is the noise spectral density. For binary data, E 
equals the transmitted signal energy per bit E b . 

Let y k denote the instantaneous SNR measured at the output of the Ath receiver during 
the transmission of a given symbol. Then, replacing the mean-square value E[|ct*p] by the 
instantaneous value |a*p in (9.62), we may write 

r k = k = \,2,...,N t 

JV 0 


Under the assumption that the random amplitude a k is Rayleigh distributed, the squared 
amplitude a" k will be exponentially distributed (i.e., chi-squared with two degrees of 
freedom, discussed in Appendix A). If we further assume that the average SNR over the 
short-term fading is the same, namely y, lv , for all the /V r diversity branches, then we may 
express the probability density functions of the random variables T^ pertaining to the 
individual branches as follows: 


fr k (r k ) = —exp 

' av 


Jk 

Y av 


r k ^o, 

k = 1,2, ...,N r 
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For some SNR y the associated cumulative distributions of the individual branches are 
described by 


p (y k ^y) = \ fr k (r k ) d r k 


= 1 - exp 


-X 

Y 

' av 


y> 0 


for k = 1, 2,..., N r . Since, by design, the /V r diversity branches are essentially statistically 
independent, the probability that all the diversity branches have an SNR less than the 
threshold y is the product of the individual probabilities that % < y for all k\ thus, using 
(9.64) in (9.65), we write 

P(r k <r) = Y[ p (h < r) 

k = i 


N r 


= n 

k = 1 


1 - exp 


1 - exp — <— 

' Y 
• av 


_Y_ 

y iv 


y> 0 


for k = 1,2, . . ., Ny note that the probability in (9.66) decreases with increasing N r 

The cumulative distribution function of (9.66) is the same as the cumulative 
distribution function of the random variable T described by the sample value 

y sc = ma x{y v y 2 , ...,y N } 

which is less than the threshold y if, and only if, the individual SNRs y f , y 2 , y N are all 
less than y. Indeed, the cumulative distribution function of the selection combiner (i.e., the 
probability that all of the N r diversity branches have an SNR less than y) is given by 


F r(^sc) 


1 - exp — 




y >0 

'SC 


By definition, the probability density function / r (/ ) is the derivative of the cumulative 
distribution function ^r(x sc ) respect to the argument y sc . Hence, differentiating 

(9.68) with respect to y sc yields 


fArJ = JT F r(yJ 

u ' sc 


£ 

y iv 



~1 - expf-— V 

l — J 

V y J 
1 av 

1 

z' 

V 

1 


-JVr-l 


y >0 

'sc 


For convenience of graphical presentation, we use the scaled probability density function 

= ^av/r sc (^c) 
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where the sample value x of the normalized variable X is defined by 

* = r sc /r m 

Figure 9.19 plots fx(x) versus x for varying number of receive-diversity branches N r under 
the assumption that the short-term SNRs for all the N r branches share the common value 
/ av . From this figure we make two observations: 

As the number of diversity branches N r is increased, the probability density function 
fx(x) of the normalized random variable X = V / y progressively moves to the 
right. 

The probability density function f%(x) becomes more and more symmetrical and, 
therefore, Gaussian as N r is increased. 

Stated in another way, a frequency-flat, slowly fading Rayleigh channel is modified through 
the use of selection combining into a Gaussian channel provided that the number of diversity 
channels /V r is sufficiently large. Realizing that a Gaussian channel is a digital communica- 
tion theorist’s dream, we now see the practical benefit of using selection combining. 

According to the theory described herein, the selection-combining procedure requires 
that we monitor the receiver outputs in a continuous manner and, at each instant of time, 
select the receiver with the strongest signal (i.e., the largest instantaneous SNR). From a 
practical perspective, such a selective procedure is rather cumbersome. We may overcome 
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X 
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Normalized probability density function f x (x) = N T exp(-x)[ 1 - exp(-jr)] 
for a varying number N r of receive antennas. 
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this practical difficulty by adopting a scanning version of the selection-combining 
procedure: 

• Start the procedure by selecting the receiver with the strongest output signal. 

• Maintain using the output of this particular receiver as the combiner’s output so long 
as its instantaneous SNR does not drop below a prescribed threshold. 

• As soon as the instantaneous SNR of the combiner falls below the threshold, select a 
new receiver that offers the strongest output signal and continue the procedure. 

This technique has a performance very similar to the nonscanning version of selective 
diversity. 

Outage Probability of Selection Combiner 

The outage probability of a diversity combiner is defined as the percentage of time the 
instantaneous output SNR of the combiner is below some prescribed level for a specified 
number of branches. Using the cumulative distribution function of (9.68), Figure 9.20 
plots the outage curves for the selection combiner with N r as the running parameter. The 
horizontal axis of the figure represents the instantaneous output SNR of the combiner 
relative to 0 dB (i.e., the 50-percentile point for N r = 1) and the vertical axis represents the 
outage probability, expressed as a percentage. From the figure we observe the following: 


Outage probability for 
selector combining for a 
varying number N r of 
receive antennas. 
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The selection-combining technique just described is relatively straightforward to 
implement. However, from a performance point of view, it is not optimum, in that it 
ignores the information available from all the diversity branches except for the particular 
branch that produces the largest instantaneous power of its own demodulated signal. 

This limitation of the selection combiner is mitigated by the maximal-ratio combiner, 
the composition of which is described by the block diagram of Figure 9.21 that consists of 
N r linear receivers followed by a linear combiner. Using the complex envelope of the 
received signal at the kth diversity branch given in (9.60), the corresponding complex 
envelope of the linear combiner output is defined by 

N r 

y(t) = ^ a k x k (t) 

k = 1 

= £ a *[ a * eXP (j 0 *) S (*) +W *(*)] 

k = 1 


N r N t 

= s(t) ^ a k a k ex p(j<%)+ ^ 
k = 1 k = 1 


where the a k are complex weighting parameters that characterize the linear combiner. 
These parameters are changed from instant to instant in accordance with signal variations 
in the N r diversity branches over the short-term fading process. The requirement is to 
design the linear combiner so as to maximize the output SNR of the combiner at each 
instant of time. From (9.70), we note the following two points: 


The complex envelope of the output signal equals the first expression 


s(t) ^ a k a k exp 0^) ■ 
k = 1 

The complex envelope of the output noise 


JV r „ 

equals the second expression ^ a k w k (t). 

k = 1 


i> 

2 > 


N ’> 


*i(b 


x 2 (t) 


x N (t) 



Block diagram of maximal-ratio combiner using N r receive antennas. 
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Assuming that the w k (t) are mutually independent for k = 1.2,..., N r the output SNR of 
the linear combiner is therefore given by 


(SNR) C 




N r 


2-i 

E 

s(t) V a k a k ex p(j<%) 



k= 1 





N t 

2-. 



E 

I U k W k {t ) 





k = 1 




E[|5(0l] E 


N 


I a k a k eX P(i 0 k) 

k = 1 


2-i 


E[| w k (t)\ ] E 


N r 


k = l 


N. 


N n 


I a k a k ex Pd 0 k) 

k = l 


N t 

II 

£ = 1 


where E/N 0 is the symbol energy-to-noise spectral density ratio. 

Let v c denote the instantaneous output SNR of the linear combiner. Then, using the two 
terms 


Nr 

I a k a k ex P(j @k) 
k= 1 


2 



k = 1 


as the instantaneous values of the expectations in the numerator and denominator of 
(9.71), respectively, we may write 


r c 


E_ 

N 0 


Nr 

I a k a k ex P(j @k) 



k = 1 


2 


The requirement is to maximize y c with respect to the a k . This maximization may be 
carried out by following the standard differentiation procedure, recognizing that the 
weighting parameters a k are complex. However, we choose to follow a simpler procedure 
based on the Schwarz inequality, which was discussed in Chapter 7. 
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Let a k and b k denote any two complex numbers for k = 1,2, N r According to the 
Schwarz inequality for complex parameters, we have 


r 

X Ukbk 

k = 1 


Nr 

^X 


N r 


which holds with equality for a k = cb k , where c is some arbitrary complex constant and 
the asterisk denotes complex conjugation. 

Thus, applying the Schwarz inequality to the instantaneous output SNR of (9.72), with 
a k left intact and b k set equal to a f . exp(j 6 k ) , we obtain 


r c 5 


N t N r 

X N 2 X l^/t ex p(j^)| 2 

E k = 1 k = 1 


N n 


N r 

sr i i 2 

X Kl 

k = 1 

Canceling common terms in the numerator and denominator, we readily obtain 

E 2 
°k = l 

Equation (9.74) proves that, in general, y cannot exceed ’V y k , where y. is as defined in 
(9.63). The equality in (9.74) holds for k 


a k = c[a k ex p(j0*)]* 

= ca k * exp(-j Q k ), k = 1,2 

where c is some arbitrary complex constant. 

Equation (9.75) defines the complex weighting parameters of the maximal-ratio 
combiner. Based on this equation, we may state that the optimal weighting factor a k for the 
Ath diversity branch has a magnitude proportional to the signal amplitude a k and a phase 
that cancels the signal phase 0 k to within some value that is identical for all the ;V r diversity 
branches. The phase alignment just described has an important implication: it permits the 
fully coherent addition of the N r receiver outputs by the linear combiner. 

Equation (9.74) with the equality sign defines the instantaneous output SNR of the 
maximal-ratio combiner, which is written as 


E ' 2 

7mtc At X Uk 

°k = 1 

2 

According to (9.62), ( E/N 0 )a k is the instantaneous output SNR of the M diversity 
branch. Hence, the maximal-ratio combiner produces an instantaneous output SNR that is 
the sum of the instantaneous SNRs of the individual branches; that is, 

N t 

Y mrc — X 7 k 
k = 1 
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The term “maximal-ratio combiner” has been coined to describe the combiner of Figure 
9.21 that produces the optimum result given in (9.77). Indeed, we deduce from this result 
that the instantaneous output SNR of the maximal-ratio combiner can be large even when 
the SNRs of the individual branches are small. Since the instantaneous SNR produced by 
the selection combiner is simply the largest among the N r terms of (9.77), it follows that: 


The maximal SNR y mrc is the sample value of a random variable denoted by T . According to 
(9.76), / mrc is equal to the sum of N r exponentially distributed random variables for a 
frequency-flat, slowly fading Rayleigh channel. From Appendix A, the probability density 
function of such a sum is known to be chi-square with 2 N T degrees of freedoms that is, 


fr(y mrc^ 


Nr - 1 

1 y mrc 

(N - 1 ) ! ~ 
1 Xav 


exp 


y mi 
y a 1 


Note that for N r = 1, (9.69) and (9.78) assume the same value, which is to be expected. 

Figure 9.22 plots the scaled probability density function, f x (x ) = / av /[ ( / mrc ) • versus 
the normalized variable x = / / / for varying N r Based on this figure, we may make 
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observations similar to those for the selection combiner, except for the fact that for any N r 
we find that the scaled probability density function for the maximal-ratio combiner is 
radically different from its counterpart for the selection combiner. 

Outage Probability for Maximal-Ratio Combiner 

The cumulative distribution function for the maximal-ratio combiner is defined by 

p (w<*) = 


where the probability density function 
Figure 9.23 plots the outage probability for the maximal-ratio combiner with N r as a running 
parameter. Comparing this figure with that of Figure 9.20 for selection combining, we see 
that the outage-probability curves for these two diversity techniques are superficially similar. 
The diversity gain, defined as the E/Nq saving at a given BER, provides a measure of the 
effectiveness of a diversity technique on an outage-probability basis. 


f /r(W) d W 

J o 

1 -f /r( W) d 4r 


/ r (y mrc ) is itself defined by (9.78). Using (9.79), 



Outage probability of maximal-ratio combiner for a varying number N r of 
receiver antennas. 
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In a theoretical context, the maximal-ratio combiner is the optimum among linear diversity 
combining techniques, optimum in the sense that it produces the largest possible value of 
instantaneous output SNR. However, in practical terms, there are three important issues to 
keep in mind: 

Significant instrumentation is needed to adjust the complex weighting parameters of 
the maximal-ratio combiner to their exact values, in accordance with (9.75). 

The additional improvement in output SNR gained by the maximal-ratio combiner 
over the selection combiner is not that large, and it is quite likely that the additional 
improvement in receiver performance is lost in not being able to achieve the exact 
setting of the maximal-ratio combiner. 

So long as a linear combiner uses the diversity branch with the strongest signal, then 
other details of the combiner may result in a minor improvement in overall receiver 
performance. 

Issue 3 points to formulation of the so-called equal-gam combiner, in which all the 
complex weighting parameters a ^ have their phase angles set opposite to those of their 
respective multipath branches in accordance with (9.75). But, unlike the a k in the 
maximal-ratio combiner, their magnitudes are set equal to some constant value, unity for 
convenience of use. 

“Space Diversity-on-Transmit” Systems 


In the wireless communications literature, space diversity-on-receive techniques are 
commonly referred to as orthogonal space-time block codes (Tarokh et ah, 1999). This 
terminology is justified on the following grounds: 

The transmitted symbols form an orthogonal set. 

The transmission of incoming data streams is carried out on a block-by-block basis. 

Space and time constitute the coordinates of each transmitted block of symbols. 

In a generic sense, Figure 9.24 presents the baseband diagram of a space-time block 
encoder, which consists of two functional units: mapper and block encoder. The mapper 
takes the incoming binary data stream [b^ }, where b k = +1, and generates a new sequence 
of blocks with each block made up of multiple symbols that are complex. For example, the 
mapper may be in the form of an M-ary PSK or M-ary QAM message constellation, which 
are illustrated forM = 16 in the signal-space diagrams of Figure 9.25. All the symbols in a 
particular column of the transmission matrix are pulse-shaped (in accordance with the 
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(a) Signal constellation of 16-PSK. (b) Signal constellation of 16-QAM. 


criteria described in Chapter 8) and then modulated into a form suitable for simultaneous 
transmission over the channel by the transmit antennas. The pulse shaper and modulator 
are not shown in Figure 9.24 as the basic issue of interest is that of baseband data 
transmission with emphasis on the formulation of space-time block codes. The block 
encoder converts each block of complex symbols produced by the mapper into an /-by-/V t 
transmission matrix S, where / and N t are respectively the temporal dimension and spatial 
dimension of the transmission matrix. The individual elements of the transmission 
matrix S are made up of linear combinations of s k and s k , where the s k are complex 
symbols and the s k are their complex conjugates. 


Quadriphase Shift Keying 

As a simple example, consider the map portrayed by the QPSK, M = 4. This map is 
described in Table 9.2, where E is the transmitted signal energy per symbol. 

The input dibits (pairs of binary bits) are Gray encoded, wherein only one bit is flipped 
as we move from one symbol to the next. (Gray encoding was discussed in Section 7.6 
under “Quadriphase Shift Keying”.) The mapped signal points lie on a circle of radius J~E 
centered at the origin of the signal-space diagram. 


Gray-encoded QPSK mapper 
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Example 6 is illustrative of the Alamouti code, which is one of the first space-time block 
codes involving the use of two transmit antennas and one signal receive antenna (Alamouti, 
1998). Figure 9.26 shows a baseband block diagram of this highly popular spatial code. 

Let ,v | and ,v 2 denote the complex symbols produced by the code’s mapper, which are 
to be transmitted over the multipath wireless channel by two transmit antennas. Signal 
transmission over the channel proceeds as follows: 

At some arbitrary time t, antenna 1 transmits Sj and simultaneously antenna 2 
transmits s 2 . 

At time t + T, where T is the symbol duration, signal transmission is switched to -s* 
transmitted by antenna 1 and simultaneously s j is transmitted by antenna 2. 

The resulting two-by-two space-time block code is written in matrix form as follows: 



"■'2 A i 

I 

Space 


Time 


Transmit 
antenna 1 


s 2 ) Transmit at time l 


,* Transmit at time r + T 


*1 


y\ Transmit 
antenna 2 


/jj = rje-i®! 


Block diagram of 
the transceiver (transmitter 
and receiver) for the Alamouti 
code. Note that t' > t to allow 
for propagation delay. 



h 2 = r 2 e > 9 2 Multiplicative 
path coefficients 
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This transmission matrix is a complex-orthogonal matrix (quaternion) in that it satisfies 
the condition for orthogonality in both the spatial and temporal senses. To demonstrate 
this important property of the Alamouti, let 



Time 


denote the Hermitian transpose of S, which involves both transposition and complex 
conjugation. To demonstrate orthogonality in the spatial sense, we multiply the code 
matrix S by its Hermitian transpose St on the right, obtaining 


SS f 




Sr,S 


2 1 





J 2 


S 2 




s l\ 


s 2 *1 


"I - |^2| 5 1 S 2 "I - 52^1 

i~ i2 i~ i2 

+ s^s 2 + J i 


(| ; i| 


0 

1 


Since the right-hand side of (9.81) is real valued, it follows that the alternative matrix 
product S , viewed in the temporal sense, yields exactly the same result. That is, 

SS 1 = Sts = (pjf + l^l 2 ) 1 

where I is the two-by-two identity matrix. 

In light of (9.80) and (9.83), we may now summarize three important properties of the 
Alamouti code: 


Unitarity (Complex Orthogonality) 

The Alamouti code is an orthogonal space-time block code, in that its transmission matrix 
is a unitary matrix with the sum term |s j|- + |^ 2 | 2 being merely a scaling factor. 

As a consequence of this property, the Alamouti code achieves full diversity. 

Full-Rate Complex Code 

The Alamouti code (with two transmit antennas ) is the only complex space-time block 
code with a code rate of unity in existence. 

Hence, for any signal constellation, full diversity of the code is achieved at the full 
transmission rate. 
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Linearity 

The Alamouti code is linear in the transmitted symbols. 

We may therefore expand the transmission matrix S of the code as a linear combination of 
the transmitted symbols and their complex conjugates, as shown by 

s _ s i r j j "H s i r p + 5 1 2 j “I - s 2 ^ 22 

where the four constituent matrices are themselves defined as follows: 


0 0 
0 1 

0 1 
0 0 

0 0 
-1 0 

In words, the Alamouti code is the only two-dimensional space-time code, the 
transmission matrix of which can be decomposed into the form described in (9.84). 



Receiver Considerations of the Alamouti Code 

The discussion presented thus far has focused on the Alamouti code viewed from the 
transmitter’s perspective. We turn next to the design of the receiver for decoding the code. 

To this end, we assume that the channel is frequency-flat and slowly time varying, such 
that the complex multiplicative distribution introduced by the channel at time t is essentially 
the same as that at time t + T, where T is the symbol duration. As before, the multiplicative 
distortion is denoted by a k c k where we now have k = 1, 2, as indicated in Figure 9.25. 
Thus, with the symbols ,V| and s 2 transmitted simultaneously at time f, the complex 
received signal at some time t' > t , allowing for propagation delay, is described by 


, ]d 2 ~ ~ 

.Vj = aqe jq + a~,e s 2 + w l 

where vtq is the complex channel noise at time t' . Next, with the symbols -s 2 and ,vj 
transmitted simultaneously at time t + T, the corresponding complex signal received at 
time t' + T is 




S j + 


1 


Sj' + W 9 


where vv 9 is the second complex channel noise at time t' + T . To be more precise, the 
noise terms Wj and w 9 are circularly-symmetric complex-valued uncorrelated Gaussian 
random variables of zero mean and equal variance. 

In the course of time from t' to t' + T , the channel estimator in the receiver has 

j ® 

sufficient time to produce estimates of the multiplicative distortion represented by a k & k 
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for k = 1,2. Hereafter, we assume that these two estimates are accurate enough for them to 
be treated as essentially exact; in other words, the receiver has knowledge of both a, e J 1 

j Q 1 

and a 0 e 2 . Accordingly, we may formulate the combination of two variables, x l in (9.85) 
and the complex conjugate of x 2 in (9.86), in matrix form as follows: 




e 

~j A 


id 2 

e 


w. 




The nice thing about this equation is that the original complex signals ,v | and ,v 2 appear as 
the vector of two unknowns. It is with this goal in mind that Xj and x 2 were used for the 
elements of the two-by-one received signal vector x, in the manner shown on the right- 
hand side of (9.87). 

According to (9.87), the channel matrix of the transmit diversity in Figure 9.25 is 
defined by 


H = 


h 11 /? 12 
h 2 j h 2 2 


\ e i 

aqe J 1 


a 2 ^ d - 






a, e 


-M 


In a manner similar to the signal-transmission matrix S , we find that the channel matrix H 
is also a unitary matrix, as shown by 

HH = (a\ + a\)l 

where, as before, I is the identity matrix and the sum term aj + ay is merely a scaling 
factor. 

Using the definition of (9.88) for the channel matrix, we may rewrite (9.87) in the 
compact matrix form 

x = Hs + w 

where 


is the complex transmitted signal vector and 


w = 


W, 
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is the additive complex channel noise vector. Note that the column vector s in (9.91) is the 
same as the first row vector in the matrix S of (9.80). 

We have now reached a point where we have to address the fundamental issue in 
designing the receiver: 


Substituting (9.90) into (9.93) and then making use of the unitarity property of the channel 
matrix described in (9.89), we obtain the mathematical basis for decoding of the Alamouti 
code: 

y = s + v 

where v is a modified form of the complex channel noise w, as shown by 


Substituting (9.88) and (9.92) into (9.95), the expanded form of the complex noise vector 
v is defined as follows: 


To this end, we introduce a new complex two-by-one vector y, defined as the matrix 
product of the received signal vector x and the Hermition transpose of the channel matrix 
H normalized with respect to the reciprocal sum term af + ; that is. 





v 


-\0i~ , j 9 0 ~ * 

a { e J l w j + « 2 e w 2 




Hence, we may go on to simply write 



Examination of (9.97) leads us to make the following statement insofar as the receiver is 
concerned: 
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This twofold statement hinges on the premise that the receiver has knowledge of the 
channel matrix H. 

Moreover, with two transmit antennas and one receive antenna, the Alamouti code 
achieves the same level of diversity as a corresponding system with one transmit antenna 
and two receive antennas. It is in this sense that a wireless communication system based 
on the Alamouti code is said to enjoy a two-level diversity gain. 


Figure 9.27 illustrates the signal-space diagram of an Alamouti-encoded system based on 
the QPSK constellation. The complex Gaussian noise clouds centered on the four signal 
points and with decreasing intensity illustrate the effects of complex noise term v on the 
linear combiner output y. 

In effect, the picture portrayed in Figure 9.27 is the graphical representation of (9.94) over 
two successive symbol transmissions at times t and t+T, repeated a large number of times. 

Suppose that the two signal constellations in the top half of the signal-space diagram in 
Figure 9.27 represent the pair of symbols transmitted at time t, for which we write 


s 


t 


' 5 i 

s 2 



Signal-space diagram for Alamouti code, using the 
QPSK signal constellation. The signal points Jj and s 2 and the 
corresponding linear normalized combiner outputs and y 2 are 
displayed in the top half of the figure. 
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Then, the remaining two signal constellations positioned in the right half of Figure 9.27 
represent the other pair of symbols transmitted at t + T, for which we write 


On this basis, we may now invoke the maximum likelihood decoding rule, discussed in 
Chapter 7, to make the three-fold statement: 

Compute the composite squared Euclidean distance metric 

||yf- 5 J 2 + |y/ + r- Sf + rll 2 

produced by sending signal vectors s f and s t + T , respectively. 

Do this computation for all four possible signal pairs in the QPSK constellation. 
Flence, the ML decoder selects the pair of signals for which the metric is the 
smallest. 

The metric’s component || y t - s J in part 1 of this statement is illustrated in Figure 9.27. 


“Multiple-Input, Multiple-Output” Systems: Basic 
Considerations 


In Sections 9.8 and 9.9, we studied space-diversity wireless communication systems 
employing either multiple receive or multiple transmit antennas to combat the multipath 
fading problem. In effect, fading was treated as a source that degrades performance, neces- 
sitating the use of space diversity on receive or transmit to mitigate it. In this section, we 
discuss MIMO wireless communication, which distinguishes itself in the following ways: 

The fading phenomenon is viewed not as a nuisance but rather as an environmental 
source of enrichment to be exploited. 

Space diversity at both the transmit and receive ends of the wireless communication 
link may provide the basis for a significant increase in channel capacity. 

Unlike conventional techniques, the increase in channel capacity is achieved by 
increasing computational complexity while maintaining the primary communication 
resources (i.e., total transmit power and channel bandwidth) fixed. 


Figure 9.28 shows the block diagram of a MIMO wireless link. The signals transmitted by 
the N t transmit antennas over the wireless channel are all chosen to lie inside a common 
frequency band. Naturally, the transmitted signals are scattered differently by the channel. 
Moreover, owing to multiple signal transmissions, the system experiences a spatial form 
of signal-dependent interference, called coantenna interference ( CAI ). 

Figure 9.29 illustrates the effect of CAI for one, two, and eight simultaneous 
transmissions and a single receive antenna (i.e., N t = 1, 2, 8 and N r = 1) using binary PSK; 
the transmitted binary PSK signals used in the simulation resulting in this figure were 
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Block diagram of MIMO wireless link with N t transmit antennas and N r 
receive antennas. 


different but they all had the same average power and occupied the same bandwidth. 
(Sellathurai and Haykin, 2008). Figure 9.29 clearly shows the difficulty that arises due to 
CAI when the number of transmit antennas N t is large. In particular, with eight 
simultaneous signal transmissions, the eye pattern of the received signal is practically 
closed. The challenge for the receiver is how to mitigate the CAI problem and thereby 
make it possible to provide increased spectral efficiency. 

In a theoretical context, the spectral efficiency of a communication system is intimately 
linked to the channel capacity of the system. To proceed with evaluation of the channel 
capacity of MIMO wireless communication, we begin by formulating a baseband channel 
model for the system as described next. 


Consider a MIMO narrowband wireless communication system built around a flat-fading 
channel, with N t transmit antennas and N r receive antennas. The antenna configuration is 
hereafter referred to as the pair (N v /V r ). For a statistical analysis of the MIMO system in 
what follows, we use baseband representations of the transmitted and received signals as 
well as the channel. In particular, we introduce the following notation: 

• The spatial parameter 

N = min {N v N t } 

defines new degrees of freedom introduced into the wireless communication system 
by using a MIMO channel with N t transmit antennas and N r receive antennas. 

• The N t - by- 1 vector 

~ „ „ T 

S O) = l>i(«), s 2 (n), ..., s N (n)] 
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denotes the complex signal vector transmitted by the N t antennas at discrete time n. 
The symbols constituting the vector s(n) are assumed to have zero mean and 
common variance cr”. The total transmit power is fixed at the value 

2 

For P to be maintained constant, the variance cr s (i.e., power radiated by each 
transmit antenna) must be inversely proportional to N t . 

• For a flat-fading Rayleigh distributing channel, we may use h^n) to denote the 
sampled complex gain of the channel coupling transmit antenna k to receive antenna 
i at discrete time n, where i = 1,2, and k = 1,2, N t . We may thus express 
the N r -by-N l complex channel matrix as 



h n (n) 

h l2 (n) — 

h\N t ( n ) 


H(n) = 

hi\{n) 

h 22 (n) — 

h 2Nl ( n ) 



h Nj \(n) 

V r 2 "" 

/? 7V r W t (”) 



N t transmit antennas 


N r 

receive 

antennas 


• The system of equations 


Xjin) = ^ h ik (n)s k (n) + w^n) 
k = l 


r i = i,2, ...,N r 
1* = 1,2, ...,N t 


defines the complex signal received at the /th antenna due to the transmitted symbol 
s k (n) radiated by the kth antenna. The term vv ■( n ) denotes the additive complex 
channel noise perturbing xXn). Let the A^ r -by-l vector 

x(n) = [xj (n),x 2 (n), ...,x N {n)] 

denote the complex received signal vector and the A^-by-l vector 

~ ^ - T 

w(n) = [w l {n),w 2 {n), ...,w N {ny\ 

denote the complex channel noise vector. We may then rewrite the system of 
equations (9.102) in the compact matrix form 

x(n) = H(«)s(n) + w(») 

Equation (9.105) describes the basic complex channel model for MIMO wireless 
communications , assuming the use of a flat-fading channel. The equation describes the 
input-output behavior of the channel at discrete time n. To simplify the exposition, 
hereafter we suppress the dependence on time n by simply writing 

x = Hs + w 
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where it is understood that all four vector/matrix terms of the equation, s, H, w, and x, are 
in actual fact dependent on the discrete time n. Figure 9.30 shows the basic channel model 
of (9.106). 

For mathematical tractability, we assume a Gaussian model made up of three elements: 

N t symbols, which constitute the transmitted signal vector s drawn from a white 
complex Gaussian codebook ; that is, the symbols s j, s 2 , . . ., s N are iid complex 
Gaussian random variables with zero mean and common variance cr~. Hence, the 
correlation matrix of the transmitted signal vector s is defined by 

R s = E[ss+] 

2 T 

- CT A t 

where L is the N t -by-N t identity matrix. 

N t x N r elements of the channel matrix H, which are also drawn from an ensemble of 
iid complex random variables with zero mean and unit variance, as shown by the 
complex distribution 

h ik : ^(0, l/V2)+j^(0, 1/72) \] = 

[A: = 1, 2, 

where AT...) denotes a real Gaussian distribution. On this basis, we find that the 
amplitude component h jk is Rayleigh distributed. It is in this sense that we 
sometimes speak of the MIMO channel as a rich Rayleigh scattering environment. 
By the same token, we also find that the squared amplitude component, namely 
\hj k \~, is a chi-squared random variable with the mean 

E[ N 2 ] = 1 for all i and k 

(The chi-squared distribution is discussed in Appendix A.) 

N t elements of the channel noise vector w, which are iid complex Gaussian random 
variables with zero mean and common variance a~ ; that is, the correlation matrix 
of the noise vector w is given by 

R w = E[ww+] 

2 T 

where I N is the /V r -by-/V r identity matrix. 
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Depiction of the basic channel model of (9.106). 
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In light of (9. 100) and the assumption that h ik is a standard Gaussian random variable with 
zero mean and unit variance, the average SNR at each receiver input of the MIMO channel 
is given by 


P = 


P_ 

2 


n A 


2 

which is, for a prescribed noise variance affixed once the total transmit power P is fixed. 
Note also that, first, ah the N t transmitted signals occupy a common channel bandwidth 
and, second, the average SNR p is independent of N r 

The idealized Gaussian model just described of a MIMO wireless communication 
system is applicable to indoor local area networks and other wireless environments, where 
the extent of user-terminal mobilities is limited. 


MIMO Capacity for Channel Known at the Receiver 


With the basic complex channel model of Figure 9.30 at our disposal, we are now ready to 
focus attention on the primary issue of interest: the channel capacity of a MIMO wireless 
link. In what follows, two special cases will be considered: the first case, entitled “ergodic 
capacity,” assumes that the MIMO channel is weakly (wide-sense) stationary and, therefore, 
ergodic. The second case, entitled “outage capacity,” considers a nonergodic MIMO channel 
under the assumption of quasi-stationarity from one burst of data transmission to the next. 


According to Shannon’s information capacity law discussed in Chapter 5, the capacity of a 
real AWGN channel, subject to the constraint of a fixed transmit power P, is defined by 

r ^ 


C = B log 


2 

V 



bits/s 


2 

where B is the channel bandwidth and cr is the noise variance measured over the 

W 

bandwidth B. Given a time-invariant channel, (9.112) defines the maximum data rate that 
can be transmitted over the channel with an arbitrarily small probability of error being 
incurred as a result of the transmission. With the channel used K times for the transmission 
of K symbols in T seconds, the transmission capacity per unit time is KIT times the 
formula for C given in (9.1 12). Recognizing that K = 2BT in accordance with the sampling 
theorem discussed in Chapter 6, we may express the information capacity of the AWGN 
channel in the equivalent form 


C 


1 , 

2 °“ * 2 


'i + 4 

v 


bits/(s Hz) 


Note that one bit per second per hertz corresponds to one bit per transmission. 
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With wireless communications as the medium of interest, consider next the case of a 
complex flat-fading channel with the receiver having perfect knowledge of the channel 
state. The capacity of such a channel is given by 


C = E 

lo g 2 

fi + wV )l 



^ O' 

w 


bits/(s Hz) 


where the expectation is taken over the gain of the channel \h\ 2 and the channel is assumed 
to be stationary and ergodic. In recognition of this assumption, C is commonly referred to 
as the ergodic capacity of the flat-fading channel and the channel coding is applied across 
fading intervals (i.e., over an “ergodic” interval of channel variation with time). 

It is important to note that the scaling factor of 1/2 is missing from the capacity 
formula of (9.114). The reason for this omission is that this equation refers to a complex 
baseband channel, whereas (9.1 13) refers to a real channel. The fading channel covered by 
(9.114) operates on a complex signal, namely a signal with in-phase and quadrature 
components. Therefore, such a complex channel is equivalent to two real channels with 
equal capacities and operating in parallel; hence the result presented in (9.1 14). 

Equation (9.114) applies to the simple case of a single-input, single-output (SISO) 
flat-fading channel. Generalizing this formula to the case of a multiple-input, multiple- 
output MIMO flat-fading channel governed by the Gaussian model described in Figure 
9.30, we find that the ergodic capacity of the MIMO channel is given by the following 
formula: 


C = 



det(R w + HR s H f )]l 
det(R w ) j 


bits/(s Hz) 


which is subject to the constraint 


max tr[R ] < P 

R s 

where P is the constant transmit power and tr[ ■ ] denotes the trace of the enclosed 
matrix. The expectation in (9.115) is over the random channel matrix H, and the 
superscript dagger notes Hermitian transposition; R s and R w are respectively the 
correlation matrices of the transmitted signal vector s and channel noise vector w. A 
detailed derivation of (9.1 15) is presented in Appendix E. 

In general, it is difficult to evaluate (9.115) except for a Gaussian model. In particular, 
substituting (9.107) and (9.110) into (9.115) and simplifying yields 


C = E 


log- 


det 


CT v t A 

V^ HH 


bits/(s Hz) 


Next, invoking the definition of the average SNR p introduced in (9. Ill), we may rewrite 
(9.1 16) in the equivalent form 


C = E 


log J det( I Wr + ^HH f 


bits/(s Hz), for N y > /\' r 
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Equation (9.117), defining the ergodic capacity of a MIMO flat-fading channel, involves 
the determinant of an A r -by-A r sum matrix (inside the braces) followed by the logarithm to 
base 2. It is for this reason that this equation is referred to as the log-det capacity formula 
for a Gaussian MIMO channel. 

As indicated in (9. 117), the log-det capacity formula therein assumes that A t > A r for the 
matrix product HH to be of full rank. The alternative case, A r > A t makes the A t -by-A t 
matrix product H ' H to be of full rank, in which case the log-det capacity formula of the 
MIMO link takes the form 


C = 



bits/(s Hz), 


A,. > N, 


where, as before, the expectation is taken over the channel matrix H. 

Despite the apparent differences between (9.117) and (9.118), they are equivalent in 
that either one of them applies to all {A r , A t } antenna configurations. The two formulas 
differentiate themselves only when the full-rank issue is of concern. 

Clearly, the capacity formula of (9.1 14), pertaining to a complex, flat-fading link with a 
single antenna at both ends of the link, is a special case of the log-det capacity formula. 
Specifically, for A t = A r = 1 (i.e., no spatial diversity), p = P/af, and II = h (with 
dependence on discrete-time n suppressed, (9.116) reduces to that of (9.114). 

Another insightful result that follows from the log-det capacity formula is that if 
A t = A r = A, then, as A approaches infinity, the capacity C defined in (9.117) grows 
asymptotically (at least) linearly with A; that is, 

C 

lim — > constant 

tv — > 00 A 

In words, the asymptotic formula of (9. 119) may be stated as follows: 


What this statement teaches us is that, by increasing computational complexity resulting 
from the use of multiple antennas at both the transmit and receive ends of a wireless link, 
we are able to increase the spectral efficiency of the link in a far greater manner than is 
possible by conventional means (e.g., increasing the transmit SNR). The potential for this 
very sizable increase in the spectral efficiency of a MIMO wireless communication system 
is attributed to the key parameter 

A = min{A t , A r ) 

which defines the number of degrees of freedom provided by the system. 


Naturally, the log-det capacity formula for the channel capacity of an A t , A r wireless link 
includes the channel capacities of receive and transmit diversity links as special cases: 
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Diversity-on- receive channel. The log-det capacity formula (9.118) applies to this 
case. Specifically, for N t = 1 , the channel matrix H reduces to a column vector and 
with it (9.1 18) reduces to 





N t 2 

l+ Pj J \ h i\ 2 



C = E 

lo §2 



> 




k i = 1 J 


1 


bits/(s Hz) 


Compared with the channel capacity of (9.114), for an SISO fading channel with 
p = P/cr“ , the squared channel gain \h\ 2 is replaced by the sum of squared 
magnitudes |/i ( | 2 , i = 1, 2, ..., N r Equation (9.120) expresses the ergodic capacity 
due to the linear combination of the receive-antenna outputs, which is designed to 
maximize the information contained in the N r received signals about the transmitted 
signal. This is simply a restatement of the maximal-ratio combining principle 
discussed in Section 9.8. 

Diversity -on-transmit channel. The log-det capacity formula of (9.117) applies to 
this second case. Specifically, for N r = 1, the channel matrix H reduces to a row 
vector, and with it (9. 1 17) reduces to 


C = 



bits/(s Hz) 


where the matrix product HH ' is replaced by the sum of squared magnitudes \hj.\ 2 , 
k = 1,2, ..., N t . Compared with case 1 on receive diversity, the capacity of the 
diversity-on-transmit channel is reduced because the total transmit power is being 
held constant, independent of the number of N t transmit antennas. 


To realize the log-det capacity formula of (9.117), the MIMO channel must be described 
by an ergodic process. In practice, however, the MIMO wireless channel is often 
nonergodic and the requirement is to operate the channel under delay constraints. The 
issue of interest is then summed up as follows: 


In the situation described here, the rate of reliable information transmission (i.e., the strict 
Shannon-sense capacity) is zero, since for any positive rate there exists a nonzero 
probability that the channel would not support such a rate. 

To get around this serious difficulty, the notion of outage is introduced into 
characterization of the MIMO link. (Outage was discussed previously in the context of 
diversity on receive in Section 9.8.) Specifically, we offer the following definition: 


MIMO Capacity for Channel Known at the Receiver 
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To proceed on this probabilistic basis, it is customary to operate the MIMO link by 
transmitting data in the form of bursts or frames and invoke a quasi-stationary model 
governed by four points: 

The burst is long enough to accommodate the transmission of a large number of 
symbols, which, in turn, permits the use of an idealized infinite-time horizon basic to 
information theory. 

Yet, the burst is short enough to treat the wireless link as quasi-stationary during 
each burst; the slow variation is used to justify the assumption that the receiver has 
perfect knowledge of the channel state. 

The channel matrix is permitted to change, from burst k to the next burst k + 1, 
thereby accounting for statistical variations of the link. 

Different realizations of the transmitted signal vector s are drawn from a white 
Gaussian codeboot, that is, the correlation matrix of s is defined by (9.107). 

Points 1 and 4 pertain to signal transmission, whereas points 2 and 3 pertain to the MIMO 
channel itself. 

To proceed with the evaluation of outage probability under this model, we first note 
that, in light of the log-det capacity formula (9. 1 17), we may view the random variable 

C k = log 2 |det^I w + | bits/(s Hz) for burst k 

as the expression for a “sample realization” of the MIMO link. In other words, with the 
random-channel matrix H /; varying from one burst to the next, Q, will itself vary in a 
corresponding way. A consequence of this random behavior is that, occasionally, a sample 
drawn from the cumulative distribution function of the MIMO link results in a value for C k 
that is inadequate to support reliable communication over the link. In this kind of situation 
the link is said to be in an outage state. Correspondingly, for a given transmission strategy, 
we define the outage probability at rate R as 

^outageW = P i C k <R k} for some burst k 
Equivalently, we may write 

P outage(' R ) = P | 1 °g2| det ( I W r + ^ H l' H l)j <jR for some burst *j 
On this basis, we may offer the following definition: 


By the very nature of it, the study of outage capacity can only be conducted using Monte 
Carlo simulation. 


The log-det capacity formula of (9.1 17) is based on the premise that the transmitter has no 
knowledge of the channel state. Knowledge of the channel state, however, can be made 
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available to the transmitter by first estimating the channel matrix H at the receiver and 
then sending this estimate to the transmitter via a. feedback channel. In such a scenario, the 
capacity is optimized over the correlation matrix of the transmitted signal vector s, subject 
to the power constraint; that is, the trace of this correlation matrix is less than or equal to 
the constant transmit power P. Naturally, formulation of the log-det capacity formula of a 
MIMO channel for which the channel is known in both the transmitter and receiver is 
more challenging than when it is only known to the receiver. For details of this 
formulation, the reader is referred to Appendix E. 

Orthogonal Frequency Division Multiplexing 


In Chapter 8 we introduced the DMT method as one discrete form of multichannel 
modulation for signaling over band-limited channels. Orthogonal frequency division 
multiplexing (OFDM) is another clearly related form of multifrequency modulation. 

OFDM is particularly well suited for high data-rate transmission over delay-dispersive 
channels. In its own way, OFDM solves the problem by following the engineering 
paradigm of “divide and conquer.” Specifically, a large number of closely spaced 
orthogonal subcarriers (tones) is used to support the transmission. Correspondingly, the 
incoming data stream is divided into a number of low data-rate substreams, one for each 
carrier, with the subchannels so formed operating in parallel. For the modulation process, 
a modulation scheme such as QPSK is used. 

What we have just briefly described here is essentially the same as the procedure used 
in DMT modulation. In other words, the underlying mathematical theory of DMT 
described in Chapter 8 applies equally well to OFDM, except for the fact that the signal 
constellation encoder does not include the use of loading for bit allocation. In addition, 
two other changes have to be made in the implementation of OFDM: 

In the transmitter, an upconverter is included after the digital-to-analog converter to 
appropriately translate the transmitted frequency, so as to facilitate propogation of 
the transmitted signal over the radio channel. 

In the receiver, a downconverter is included before the analog-to-digital converter to 
undo the frequency translation that was performed by the upconverter in the 
transmitter. 

Figure 9.31 shows the block diagram of an OFDM system, the components of which are 
configured to accommodate the transmission of a binary data stream at 36 Mbit/s as an 
illustrative example. Parts a and b of the figure depict the transmitter and receiver of the 
system, respectively. Specifically, pertinent values of data carrier rates as well as sub- 
carrier frequencies at the various functional blocks are included in part a of the figure 
dealing with the transmitter. One last comment is in order: the front end of the transmitter 
and the back end of the receiver are allocated to forward error-correction encoding and 
decoding, respectively, for improved reliability of the system. (Error-control coding of the 
forward error-correction variety is discussed in Chapter 10.) 


A compelling practical importance of OFDM to wireless communications is attributed to 
the computational benefits brought about by the FFT algorithm that plays a key role in its 
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(a) 



(b) 

Block diagram of the typical implementation of an OFDM, illustrating the transmission 
of binary data at 36 Mbit/s. 


implementation. However, OFDM suffers from the so-called PAPR problem. This 
problem arises due to the statistical probabilities of a large number of independent 
subchannels in the OFDM becoming superimposed on each other in some unknown 
fashion, thereby resulting in high peaks. For a detailed account of the PAPR problem and 
how to mitigate it, the reader is referred to Appendix G. 


Spread Spectrum Signals 


In previous sections of this chapter we described different methods for mitigating the 
effect of multipath interference in signaling over fading channels. In this section of the 
chapter, we describe another novel way of thinking about wireless communications, which 
is based on a class of signals called spread spectrum signals. 

A signal is said to belong to this class of signals if it satisfies the following two 
requirements: 

Spreading. Given an information-bearing signal, spreading of the signal is 
accomplished in the transmitter by means of an independent spreading signal , such 
that the resulting spread spectrum signal occupies a bandwidth much larger than the 
bandwidth of the original information-bearing signal: the larger the better. 
Despreading. Given a noisy version of the transmitted spread spectrum signal, 
despreading (i.e., recovering the original information-bearing signal) is achieved by 
correlating the received signal with a synchronized replica of the spreading signal in 
the receiver. 
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In effect, the information-bearing signal is spread (increased) in bandwidth before its 
transmission over the channel, and the received signal at the channel output is despread 
(i.e., decreased) in bandwidth by the same amount. 

To explain the rationale of spread spectrum signals, consider, first, a scenario where 
there are no interfering signals at the channel output whatsoever. In this idealized scenario, 
an exact replica of the original information-bearing signal is reproduced at the receiver 
output; this recovery follows from the combined action of spreading and despreading, in 
that order. We may thus say that the receiver performance is transparent with respect to 
the combined spreading-despreading process. 

Consider, next, a practical scenario where an additive narrowband interference is 
introduced at the receiver input. Since the interfering signal is introduced into the 
communication system after transmission of the information-bearing signal, its bandwidth 
is increased by the spreading signal in the receiver, with the result that its power spectral 
density is correspondingly reduced. Typically, at its output end, the receiver includes a 
filter whose bandwidth-occupancy matches that of the information-bearing signal. 
Consequently, the average power of the interfering signal is reduced, and the output SNR 
of the receiver is increased; hence, there is practical benefit in improved SNR to be gained 
from using the spread spectrum technique when there is an interfering signal (e.g., due to 
multipath) to deal with. Of course, this benefit is obtained at the expense of increased 
channel bandwidth. 


Depending on how the use of spread spectrum signals is carried out, we may classify them 
as follows: 

Direct Sequence-Spread Spectrum 

One method of spreading the bandwidth of an information-bearing signal is to use 
the so-called direct sequence-spread spectrum (DS-SS), wherein a pseudo-noise 
(PN) sequence is employed as the spreading sequence (signal). The PN sequence is 
a periodic binary sequence with noise-like properties, details of which are 
presented in Appendix J. The baseband modulated signal, representative of the 
DS-SS method, is obtained by multiplying the information-bearing signal by the 
PN sequence, whereby each information bit is chopped into a number of small 
time increments, called chips. The second stage of modulation is aimed at 
conversion of the baseband DS-SS signal into a form suitable for transmission over 
a wireless channel, which is accomplished by using M- ary PSK, discussed in 
Chapter 7. The family of spread spectrum systems so formed is referred to simply 
as DS/MPSK systems, a distinct characteristic of which is that spreading of the 
transmission bandwidth takes place instantaneously. Moreover, the signal- 
processing capability of these systems to combat the effect of interferers, 
commonly referred to as jammers be they friendly or unfriendly, is a function of 
the PN sequence length. Unfortunately, this capability is limited by physical 
considerations of the PN-sequence generator. 

Frequency Hop-Spread Spectrum 

To overcome the physical limitations of DS/MPSK systems, we may resort to 
alternative methods. One such method is to force the jammer to occupy a wider 
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spectrum by randomly hopping the input data-modulated carrier from one frequency 
to the next. In effect, the spectrum of the transmitter signal is spread sequentially 
rather than instantaneously; the term sequentially refers to the pseudo-randomly 
ordered sequence of frequency hops. This second type of spread spectrum in which 
the carrier hops randomly from one frequency to another is called frequency hop- 
spread spectrum. A commonly used modulation format used herein is that of M- ary 
FSK, which was also discussed in Chapter 7. The combination of the two 
modulation techniques, namely frequency hopping and M-ary FSK, is referred to 
simply as FH/MFSK. Since frequency-hopping does not cover over the entire spread 
spectrum instantaneously, we are led to consider the rate at which the hops occur. In 
this context, we may go on to identify two basic kinds of frequency hopping, which 
are the converse of each other, as summarized here: 

• First, slow-frequency hopping, in which the symbol rate of the M- ary FSK signal, 
denoted by R s , is an integer multiple of the hop rate, denoted by // h ; that is, several 
symbols of the input data sequence are transmitted for each frequency hop. 

• Second, fast-frequency hopping, in which the hop rate is an integer multiple of 
the M-ary FSK symbol rate R s ; that is, the carrier frequency will change (i.e., 
hop) several times during the transmission of one input-data symbol. 

The spread spectrum technique of the FH variety is particularly attractive for 
military applications. But, compared with the alternative spread spectrum technique, 
DS/MPSK, the commercial use of FH/MFSK is insignificant, which is especially so 
in regard to fast frequency hopping. The limiting factor behind this statement is the 
expense involved in the employment of frequency synthesizers, which are basic to 
the implementation of FH/MFSK systems. Accordingly, the FH/MFSK will not be 
considered further. 


Before closing this section on spread spectrum signals, it is informative to expand on the 
improvement in SNR gained at the receiver output, mentioned earlier on. To this end, 
consider the simple case of the DS/BPSK, in which the binary PSK, representing the 
second stage of modulation in the transmitter, is coherent; that is, the receiver is 
synchronized with the transmitter in all of its features. In Problem 9.34, it is shown that the 
processing gain of a spread spectrum signal compared to its unspread version is 



where 7 h is the bit duration and T c is the chip duration. With PG expressed in decibels, in 
Problem 9.34 it is also shown that 

10 log 10 (SNR)o = 10 log 10 (SNR), + 10 log 10 (PG) dB 

where (SNR)j and (SNR)q are the input SNR and output SNR, respectively. Furthermore, 
recognizing that the ratio T^T C is equal to the number of chips contained in a single bit 
duration, it follows that the processing gain realized by the use of DS/BPSK increases 
with increasing length of a single period of the PN sequence, which was emphasized 
previously. 
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Code-Division Multiple Access 


Modern wireless networks are commonly of a multiuser type, in that the multiple 
communication links within the network are shared among multiple users. Specifically, 
each individual user is permitted to share the available radio resources (i.e., time and 
frequency) with other users in the network and do so in an independent manner. 

Stated in another way, a multiple access technique permits the radio resources to be 
shared among multiple users seeking to communicate with each other. In the context of 
time and frequency domains, we recall from Chapter 1 that frequency-division multiple 
access (FDMA) and time-division multiple access (TDMA) techniques allocate the radio 
resources of a wireless channel through the use of disjointedness (i.e., orthogonality) in 
frequency and time, respectively. On the other hand, the code-division multiple access 
(CDMA) technique, building on spread spectrum signals and benefiting from their 
attributes, provides an alternative to the traditional techniques of FDMA and TDMA; it 
does so by not requiring the bandwidth allocation of FDMA nor the time synchronization 
needed in TDMA. Rather, CDMA operates on the following principle: 


This statement is testimony to what we said in the first paragraph of Section 9.13, namely 
that spread spectrum signals provide a novel way of thinking about wireless 
communications. 

To elaborate on the way in which CDMA distinguishes itself from FDMA and TDMA 
in graphical terms, consider Figure 9.32. Parts a and b of the figure depict the ways in 
which the radio resources are distributed in FDMA and TDMA, respectively. To be 
specific: 

• In FDMA, the channel bandwidth B is divided equally among a total number of K 
users, with each user being allotted a subband of width B/K and having the whole 
time resource T at its disposal. 

• In TDMA, the time resource T is divided equally among the K users, with each user 
having total access to the frequency resource, namely the total channel bandwidth B , 
but for only T/K in each time frame. 

In a way, we may therefore think of FDMA and TDMA as the dual of each other. 

Turning next to Figure 9.32c, we see that CDMA operates in a manner entirely 
different from both FDMA and TDMA. Graphically, we see that each CDMA user has full 
access to the entire radio resources at every point in time from one frame to the next. 
Nevertheless, for the full utilization of radio resources to be achievable, it is necessary that 
the spreading codes assigned to all the K users form an orthogonal set. 

In other words, orthogonality is a common requirement to the FDMA, TDMA, and 
CDMA, each in its own specific way. However, this requirement is easier to implement 
practically in FDMA and TDMA than it is in CDMA. 

In an ideal CDMA system, to satisfy the orthogonality requirement, the cross- 
correlation between any two users of the system must be zero. Correspondingly, for this 
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(a) 




(b) (c) 


Resource distribution in (a) FDMA, (b) TDM A, and (c) CDMA. This figure shows the 
essence of multiple access as in Figure 1.2 with a difference: Figure 9.32 is quantitative in its 
description of multiple-access techniques. 


ideal condition to be satisfied, we require that the cross-correlation function between the 
spreading sequences (codes) assigned to any two CDMA users of the system must be zero 
for all cyclic shifts in time. Unfortunately, ordinary PN sequences do not satisfy the 
orthogonality requirement because of their relatively poor cross-correlation properties. 

Accordingly, we have to look to alternative spreading codes to satisfy the orthogonality 
requirements. Fortunately, such an endeavor is mathematically feasible, depending on 
whether synchrony of the CDMA receiver to its transmitter is required or not. In what 
follows, we describe the use of Walsh-Hadamard sequences for the synchronous case and 
Gold sequences for the asynchronous case. 


Consider the case of a CDMA system, for which synchronization among users of the 
system is permissible. Under this condition, perfect orthogonality of two spreading 
signals, cp) and c k (t), respectively assigned to users j and k for different time offsets, 
namely 


00 

R jk (T) = J Cj( t) C* k (t - t) At = 0 for j* k 

—00 

reduces to 


00 

Rj k ( 0) = | Cj(t) C| (?) dr = 0 forjVA:and r=0 

—00 

where the asterisk denotes complex conjugation. It turns out that, for the special case 
described in (9.128), the orthogonality requirement can be satisfied exactly, and the 
resulting sequences are known as the Walsh-Hadamard sequences (codes). 


562 


Signaling over Fading Channels 


To construct a Walsh-Hadamard sequence, we begin with a 2 x 2 matrix, denoted by 
H 2 , for which the inner product of its two rows (or two columns) is zero. For example, we 
may choose the matrix 


the two rows of which are indeed orthogonal to each other. To go on and construct a 
Walsh-Hadamard sequence of length 4 using H 2 , we construct the Kronecker product of 
H 9 with itself, as shown by 

H 4 = H 2 0 h 2 

To explain what we mean by the Kronecker product in a generic sense, let A = {a^} and 
B = (bj k ) denote m x m and n x n matrices, respectively. Then, we may introduce the 
following rule: 


Construction of Hadamard-Walsh H 4 from H 2 

For the example of (9.129) on matrix H 2 , applying the Kronecker product rule, we may 
express the H 4 of (9.130) as follows: 


+ 1 X H 9 

+ 1 xH, 

+ 1 x H 9 

- 1 x H 2 _ 

+ 1 +1 

+ 1 + 1 

+ 1 - 1 

+ 1 - 1 

+ 1 +1 

- 1 - 1 

+ 1 - 1 

- 1 + 1 


The four rows (and columns) of H 4 defined in (9.131) are indeed orthogonal to each other. 

Carrying on in this manner, we may go on to construct the Hadamard— Walsh sequences 
Hg, H 8 , and so on. 


In practical terms, a synchronous CDMA system is achievable provided that a single 
transmitter (e.g., the base station of a cellular network) transmits individual data streams 
simultaneously, with each data stream being addressed to a specific CDMA user (e.g., 
mobile unit). 


Whereas Walsh-Hadamard sequences are well suited for synchronous CDMA, Gold 
sequences, on the other hand, are well suited for applications in asynchronous CDMA; 
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therein, time- and phase-shifts between individual user signals, measured with respect to 
the base station in a cellular network, occur in a random manner; hence the adoption of 
asynchrony. 

Gold sequences constitute a special class of maximal-length sequences, the generation 
of which is embodied in Gold's theorem , stated as follows: 


To understand Gold’s theorem, we need to define what we mean by a primitive 
polynomial. Consider a polynomial g(X) defined over a binary field (i.e., a finite set of two 
elements, 0 and 1, which is governed by the rules of binary arithmetic). The polynomial 
g(X) is said to be an irreducible polynomial if it cannot be factored using any polynomials 
from the binary field. An irreducible polynomial g(X) of degree m is said to be a primitive 
polynomial if the smallest integer m for which the polynomial g(X) divides the factor 
X" + 1 is n = 2 m - 1. The topic of primitive polynomials is discussed in Chapter 10 on 
error-control coding. 


Correlation Properties of Gold Codes 

As an illustrative example, consider Gold sequences with period 2 7 - 1 = 127. To generate such 
a sequence for n = 7 we need a preferred pair of PN sequences that satisfy (9.132) (n odd), as 
shown by 


2 («+l)/2 + 


1 = 2 4 + 1 = 17 


This requirement is satisfied by the Gold-sequence generator shown in Figure 9.33 that 
involves the modulo-2 addition of these two sequences. According to Gold’s theorem, 
there are a total of 


2" + 1 = 2 ? + 1 = 129 

sequences that satisfy (9.132). The cross-correlation between any pair of such sequences is 
shown in Figure 9.34, which is indeed in full accord with Gold’s theorem. In particular, 
the magnitude of the cross-correlation is less than or equal to 17. 
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Clock 


Generator for a Gold sequence of period 2 7 - 1 = 127. 



Cross-correlation function R\ 2 of a pair of Gold sequences based on the two PN 
sequences [7,4] and [7, 6,5, 4]. 


The RAKE Receiver and Multipath Diversity 


A discussion of wireless communications using CDMA would be incomplete without a 
description of the RAKE receiver. The RAKE receiver was originally developed in the 
1950s as a diversity receiver designed expressly to equalize the effect of multipath. First, 
and foremost, it is recognized that useful information about the transmitted signal is 
contained in the multipath component of the received signal. Thus, taking the viewpoint 
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that multipath may be approximated as a linear combination of differently delayed echoes, 
as shown in the maximal ratio combiner of Figure 9.21, the RAKE receiver seeks to 
combat the effect of multipath by using a correlation method to detect the echo signals 
individually and then adding them algebraically. In this way, intersymbol interference due 
to multipath is dealt with by reinserting different delays into the detected echoes so that 
they perform a constructive rather than destructive role. 

Figure 9.35 shows the basic idea behind the RAKE receiver. The receiver consists of a 
number of correlators connected in parallel and operating in a synchronous fashion with 
each other. Each correlator has two inputs: (1) a delayed version of the received signal and 
(2) a replica of the PN sequence used as the spreading code to generate the spread 
spectrum-modulated signal at the transmitter. In effect, the PN sequence acts as a 
reference signal. Let the nominal bandwidth of the PN sequence be denoted as W = 1 /T c , 
where T c is the chip duration. From the discussion on PN sequences presented in 
Appendix J, we find that the autocorrelation function of a PN sequence has a single peak 
of width 1 /W, and it disappears toward zero elsewhere inside one period of the PN 
sequence (i.e., one symbol period). Thus, we need only make the bandwidth W of the PN 
sequence sufficiently large to identify the significant echoes in the received signal. To be 
sure that the correlator outputs all add constructively, two other operations are performed 
in the receiver by the functional blocks labeled “phase and gain adjustors”: 

An appropriate delay is introduced into each correlator output, so that the phase 
angles of the correlator outputs are in agreement with each other. 

The correlator outputs are weighted so that the correlators responding to strong 
paths in the multipath environment have their contributions accentuated, while the 
correlators not synchronizing with any significant path are correspondingly 
suppressed. 



Block diagram of the RAKE receiver for CDMA over multipath channels. 
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The weighting coefficients cif, are computed in accordance with the maximal ratio 
combining principle, discussed in Section 9.8. Specifically, we recall that the SNR of a 
weighted sum, where each element of the sum consists of a signal plus additive noise of 
fixed power, is maximized when the amplitude weighting is performed in proportion to the 
pertinent signal strength. That is, the linear combiner output is 

M 

y(0 = £ a k z k (t ) 

k = i 

where z^t) is the phase-compensated output of the kx\\ correlator and M is the number of 
correlators in the receiver. Provided that we use enough correlators in the receiver to span 
a region of delays sufficiently wide to encompass all the significant echoes that are likely 
to occur in the multipath environment, the output y(t) behaves essentially as though there 
was a single propagation path between the transmitter and receiver rather than a series of 
multiple paths spread in time. 

To simplify the presentation, the receiver of Figure 9.35 assumes the use of binary PSK in 
performing spread spectrum modulation at the transmitter. Thus, the final operation per- 
formed in Figure 9.35 is that of integrating the linear combiner output y(t) over the bit dura- 
tion Tj, and then determining whether binary symbol 1 or 0 was transmitted in that bit interval. 

The RAKE receiver derives its name from the fact that the bank of parallel correlators 
has an appearance similar to the fingers of a rake; see Figure 9.36. Because spread 
spectrum modulation is basic to the operation of CDMA wireless communications, it is 
natural for the RAKE receiver to be central to the design of the receiver used in this type of 
multiuser radio communication. 


Picture of a rake, symbolizing the bank of correlators. . 


Summary and Discussion 


In this chapter we discussed the topic of signaling over fading channels, which is at the 
heart of wireless communications. There are three major sources of signal degradation in 
wireless communications: 

• co-channel interference, 

• fading, and 

• delay spread. 

The latter two are by-products of the multipath phenomenon. A common characteristic of 
these channel impairments is that they are all signal-dependent phenomena. As it is with 
intersymbol interference that characterizes signaling over band-limited channels discussed 
in Chapter 8, the degrading effects of interference and multipath in wireless 
communications cannot be combated by simply increasing the transmitted signal, which is 
what is done when noise is the only source of channel impairment as discussed in Chapter 7. 

To combat the effects of multipath and interference, we require the use of specialized 
techniques that are tailor-made for wireless communications. These specialized techniques 
include space diversity, which occupied much of the material presented in this chapter. 
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We discussed different forms of space diversity, the main idea behind which is that two 
or more propagation paths connecting the receiver to the transmitter are better than a 
single propagation path. In historical terms, the first form of space diversity used to 
mitigate the multipath fading problem was that of receive diversity, involving a single 
transmit antenna and multiple receive antennas. Under receive diversity, we discussed the 
selection combiner, maximal-ratio combiner, and equal-gain combiner: 

• The selection combiner is the simplest form of receive diversity. It operates on the 
principle that it is possible to select, among N r receive-diversity branches, a 
particular branch with the largest output SNR; the branch so selected defines the 
desired received signal. 

• The maximal-ratio combiner is more powerful than the selection combiner by virtue 
of the fact that it exploits the full information content of all the ;V r receive-diversity 
branches about the transmitted signal of interest; it is characterized by a set of N v 
receive-complex weighting factors that are chosen to maximize the output SNR of 
the combiner. 

• The equal-gain combiner is a simplified version of the maximal-ratio combiner. 

We also discussed diversity-on-transmit techniques, which may be viewed as the dual of 
their respective diversity-on-receive techniques. Much of the discussion here focused on 
the Alamouti code, which is simple to design, yet powerful in performance, in that it 
realizes a two-level diversity gain: in other terms of performance, the Alamouti code is 
equivalent to a linear diversity-on-receive system with a single antenna and two receive 
antennas. 

By far, the most powerful form of space diversity is the use of multiple antennas at both 
the transmit and receive ends of the wireless link. The resulting configuration is referred to 
as a MIMO wireless communication system, which includes the receive diversity and 
transmit diversity as special cases. The novel feature of the MIMO system is that, in a rich 
scattering environment, it can provide a high spectral efficiency, which may be simply 
explained as follows. The signals transmitted simultaneously by the transmit antennas 
arrive at the input of each receive antenna in an uncorrelated manner due to the rich 
scattering mechanism of the channel. The net result is a spectacular increase in the spectral 
efficiency of the wireless link. Most importantly, the spectral efficiency increases roughly 
linearly with the number of transmit or receive antennas, whichever is the smaller one of 
the two. This important result assumes that the receiver has knowledge of the channel 
state. The spectral efficiency of the MIMO system can be further enhanced by including a 
feedback channel from the transmitter to the receiver, whereby the channel state is also 
made available to the transmitter and with it the transmitter is enabled to exercise control 
over the transmitted signal. 


An issue of paramount practical importance in wireless communications is that of multiple 
access to the wireless channel, in the context of which the following two approaches are 
considered to be the dominant ones: 

Orthogonal frequency division multiple access (OFDMA), which is the multi-user 
version of OFDM that was discussed in Section 9.12. In OFDMA multiple access is 
accomplished through the assignment of subchannels (subcarriers) to individual users. 
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Naturally, OFDMA inherits the distinctive features of OFDM. In particular, OFDMA 
is well suited for high data-rate transmissions over delay-dispersive channels, realized 
by exploiting the principle of “divide and conquer.” Accordingly, OFDMA is 
computationally efficient in using the FFT algorithm. Moreover, OFDMA lends itself 
to the combined use of MIMO, hence the ability to improve spectral efficiency and 
take advantage of channel flexibility. 

Code-divisicni multiple access (CDMA), which distinguishes itself by exploiting the 
underlying principle of spread spectrum signals, discussed in Section 9.13. To be 
specific, through the combined process of spectrum spreading in the transmitter and 
corresponding spectrum despreading in the receiver, a certain amount of processing 
gain is obtained, hence the ability of CDMA users to occupy the same channel 
bandwidth. Moreover, CDMA provides a flexible procedure for the allocation of 
resources (i.e., PN codes) among a multiplicity of active users. Last but by no means 
least, in using the RAKE, viewed as an adaptive TDL filter, CDMA is enabled to 
match the receiver input to the channel output by adjusting tap delays as well as tap 
weights, thereby enhancing receiver performance in the presence of multipath. 

To conclude, OFDMA and CDMA provide two different approaches for the multiple 
access of active users to wireless channels, each one of which builds on its own distinctive 
features. 


Problems 

Effect of Flat Fading on the BER of Digital Communications Receivers 

Derive the BER formulas listed in the right-hand side of Table 9.2 for the following signaling 
schemes over flat fading channels: 

Binary PSK using coherent detection 
Binary FSK using coherent detection 
Binary DPSK 

Binary FSK using noncoherent detection 

Using the formulas derived in Problem 9.1, plot the BER charts for the schemes described therein. 

Selective Channels 

Consider a time-selective channel, for which the modulated received signal is defined by 

N 

x(t) = £ a n (t)m(t) cos (2nf c t + <f(t)+ cr n (t)) 

n = 1 

where m(t) is the message signal, <f(t) is the result of angle modulation; the amplitude cc n (t) and 
phase <J n (t ) are contributed by the nth path, where n= 1,2, 

Using complex notation, show that the received signal is described as follows: 

x(t) = a(t)s(t ) 

where 

N 

a(t) = ^ a n (0 

n = 1 


What is the formula for s(t) ? 
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Show that the delay-spread function of the multipath channel is described by 

h(r,t ) = a(t)8(r) 

where <5(r) is the Dirac delta function in the r-domain. Hence, justify the statement that the 
channel described in this problem is a time-selective channel. 

Let S_(/) and S-(f) denote the Fourier transforms of a(t) and i(f) , respectively. What then is 
the Fourier transform of x(t) ? 

Using the result of part c, justify the statement that the multipath channel described herein can be 
approximately frequency-flat. What is the condition that would satisfy this description? 

In this problem, we consider a multipath channel embodying large-scale effects. Specifically, using 
complex notation, the received signal at the channel output is described by 

L 

x(t) = ^ a is(t- Tj) 
l = 1 

where a; and r ; denote the amplitude and time delay associated with the /th path in the channel for 
I = 1,2, L. Note that a; is assumed to be constant for all l. 

Show that the delay-spread function of the channel is described by 

L 

h(T-t) = ^ aiS(r- r t ) 
l= l 

where S(t) is the Dirac delta function expressed in the r-domain. 

This channel is said to be time-nonselective. Why? 

The channel does exhibit a frequency-dependent behavior. To illustrate this behavior, consider 
the following delay-spread function: 

h(r;t ) = 5(t) + cc 28 (t - z 2 ) 

where r 2 is the time delay produced by the second path in the channel. Plot the magnitude 
(amplitude) response of the channel for the following specifications: 

~a 2 = 0.5 
«2 = j/2 
02 = -j 

where j = . Comment on your results. 

Expanding on the multipath channel considered in Problem 9.4, a more interesting case is 
characterized by the scenario in which the received signal at the channel output is described as follows: 

L 

x(t) = ^ ai(t)s(t- T t {t)) 
l = l 

where the amplitude «/(f) and time delay r ; (f) for the /th path are both time dependent for 

1 = 1 , 2 , 

Show that the delay-spread function of the multipath channel described herein is given by 

h(r;t) = ^ ai(t)5(r- r t (t)) 
l= l 

where <5(r) is the Dirac delta function in the r-domain. This channel is said to exhibit both 
large- and small-scale effects. Why? 

The channel is also said to be both time selective and frequency selective. Why? 
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To illustrate the point made under b, consider the following channel description: 
h(r\t) = a\{t) 8( t) + aiit) 8(t - t 2 ) 
where a\(t) and ai(t) are both Rayleigh processes. 

For selected oq(f) , a 2 (t ) and t 2 , do the following: 

At each time t = 0, compute the Fourier transform of h( r; t). 

Hence, plot the magnitude spectrum of the channel, that is, \H(f;t)\ , expressed as a 
function of both time t and frequency/. 

Comment on the results so obtained. 

Consider a multipath channel where the delay-spread function is described by 

h(r;t) = ^ ai(t)S(r- r t ) 
l= l 

where the scattering processes attributed to the time- varying amplitude et/(f) and fixed delay r, are 
uncorrelated for / = 1, 2, . . ., L. 

Determine the correlation function of the channel, namely R-( r ]; fj;r 2 , t 2 ) . 

With a Jakes model for the scattering process described in (9.12), find the corresponding formula 
for the correlation function of the channel under part a of the problem. 

Hence, justify the statement that the multipath channel described in this problem fits a WSSUS 
model. 

Revisit the Jakes model for a fast fading channel described in (9.12). Let the coherence time be 
defined as that range of values At over which the correlation function defined in (9.12) is greater 
than 0.5. 

For some prescribed maximum Doppler shift v max , find the coherence time of the channel. 
Consider a multipath channel for which the delay-spread function is given by 

h(r\t) = ^ r t ) 

l= l 

where the amplitude a;(r) is time varying but the time delay Tj is fixed. As in Problem 9.4, the 
scattering processes are described by the Jakes model in (9.12). Determine the power-delay profile 
of the channel, P-( r) . 

Ii 

In real-life situations, the wireless channel is nonstationary due to the presence of moving objects of 
different kinds and other physical elements that can significantly affect radio propagation. Naturally, 
different types of wireless channels have different degrees of nonstationarity. 

Even though many wireless communication channels are indeed highly nonstationary, the WSSUS 
model described in Section 9.4 still provides a reasonably accurate account of the statistical 
characteristics of the channel. Elaborate on this statement. 

“Space Diversity-on-Receive” Systems 

Following the material presented on Rayleigh fading in Chapter 4, derive the probability density 
function of (9.64). 

A receive-diversity system uses a selection combiner with two diversity paths. The outage occurs 
when the instantaneous SNR /drops below 0.25y av , where y av is the average SNR. 

Determine the probability of outage experienced by the receiver. 
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The average SNR in a selection combiner is 20 dB. Compute the probability that the instantaneous 
SNR of the selection combiner drops below y= 10 dB for the following number of receive antennas: 
N r =l 
N r = 2 
N r = 3 
N t = 4. 

Comment on your results. 


Repeat Problem 9.12 for y= 15 dB. 


In Section 9.8 we derived the optimum values of (9.75) for complex weighting factors of the 
maximal-ratio combiner using the Cauchy-Schwartz inequality. 

This problem addresses the same issue, but this time we use the standard maximization procedure. 
To simplify matters, the number of diversity paths N T is restricted to two, with the complex 
weighting parameters denoted by a j and fl 2 . Let 

a k = x k + iy* k = 1,2 


The complex derivative with respect to a k is defined by 

_ 8 _ = ifjL + 'jn 

8a* k 2\dx k ] dy k )’ 


k = 1,2 


Applying this formula to the combiner’s output SNR y c of (9.71), derive the optimum y in (9.75). 

As discussed in Section 9.8, an equal-gain combiner is a special form of the maximal-ratio combiner 
for which the weighting factors are all equal. For convenience of presentation, the weighting 
parameters are set to unity. 

Assuming that the instantaneous SNR y is small compared with the average SNR y av , derive an 
approximate formula for the probability density function of the random variable T represented by 
the sample y. 

Compare the performances of the following linear “diversity-on-receive” techniques: 

Selection combiner. 

Maximal-ratio combiner. 

Equal-gain combiner. 

Base the comparison on signal-to-noise improvement, expressed in decibels for the following 
number of diversity branches: N x = 2, 3, 4, 5, 6. 

Show that the maximum-likelihood decision rule for the maximal-ratio combiner may be formulated 
in the following two equivalent forms: 

If 

[(aj+ a!)|Sj| 2 < [{a\ + aj)\s k \ 2 -y x s k * -y^^, k*i 

then choose symbol s ; - over s k . 

If, by the same token, 

[{a\+ a\-l)\s^ +d 2 {y l ,s i )]<[(a\ + a 2 -l)\s k \^ + d 2 {y x ,s k )], k*i 

then choose symbol s ; - over s k . Here, d 2 (y\,sj) denotes the squared Euclidean distance between 
the signal points yj and s t . 

It may be argued that, in a rather loose sense, transmit-diversity and receive-diversity antenna 
configurations are the dual of each other, as illustrated in Figure P9.18. 

Taking a general viewpoint, justify the mathematical basis for this duality. 
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However, we may cite the example of frequency-division diplexing (FDD) for which, in a strict 
sense, we find that the duality depicted in Figure P9.18 is violated. How is it possible for the 
violation to arise in this example? 
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“Space Diversity-on-Transmit” Systems 

Show that the two-by-two channel matrix in (9.88), defined in terms of the multiplicative fading 
j 0 1 j 0 2 

factors ctje and ^e , is a unitary matrix, as shown by 


j A 

t 


j A 



a 2 e 



«2 e 

= («[ + a]) 

1 0 

-M 


“j $2 


0 1 

-a t e 


a 2 e 

-a t e 




Derive the formula for the average probability of symbol error incurred by the Alamouti code. 

Figure P9.22 shows the extension of orthogonal space-time codes to the Alamouti code, using two 
antennas on both transmit and receive. The sequence of signal encoding and transmissions is 
identical to that of the single-receiver case of Figure 9.18. Part a of the table below defines the 
channels between the transmit and receive antennas. Part b of the table defines the outputs of the 
receive antennas at times t' and t' + T , where T is the symbol duration. 

Derive expressions for the received signals x lt x 0 , x 2 , and x 4 , including the respective additive 
noise components expressed in terms of the transmitted symbols. 

Derive expressions for the line of combined outputs in terms of the received signals. 

Derive the maximum-likelihood decision rule for the estimates Sj and s 2 • 


a. Transmit antenna 1 

>h 

h 

Transmit antenna 2 

h 2 

h 4 

b. Time t' 

*1 

~ x 3 

Time t' + T 

i 2 

x 4 


This problem explores a new interpretation of the Alamouti code. Let 


( 1 ) • ( 2 ) 
s i + J'fi ’ 


i = 1, 2 
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where .y ■ 1 ^ and jj- ; are both real numbers. The complex entry in the 2-by-2 Alamouti code is 
represented by the 2-by-2 real orthogonal matrix 

i=l,2 

Likewise, the complex-conjugated entry s* is represented by the 2-by-2 real orthogonal matrix 

(= 1,2 



4 2) 

<N 

1 

4 X) 


s™ -s[" 




Show that the 2-by-2 complex Alamouti code S is equivalent to the 4-by-4 real transmission 
matrix 
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S l 
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U 2 > 

5 (1) 

s 2 
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-s (1) 

s 2 
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Show that S 4 is an orthogonal matrix. 

What is the advantage of the complex code S over the real code S 4 ? 

For two transmit antennas and simple receive antenna, the Alamouti code is said to be the only 
optimal space-time block. Using the log-det formula of (9.1 17), justify this statement. 

Show that the channel capacity of the Alamouti code is equal to the sum of the channel capacities of 
two SISO systems with each one of them operating at half the original bit rate. 

MIMO Wireless Communications 

Show that, at high SNRs, the capacity gain of a MIMO wireless communication system with the 
channel state known to the receiver is TV = min{/V t ,.(V r ) bits per second per hertz for every 3 dB 
increase in SNR. 

To calculate the outage probability of MIMO systems, we use the complementary cumulative distri- 
bution function of the random channel matrix H rather than the cumulative probability function itself. 
Explain this rationale for calculating the outage probability. 

Equation (9.120) defines the formula for the channel capacity of diversity-on-receive channel. 

In Section 9.8 we pointed out that the selection combiner is a special case of the maximal-ratio 
combiner. Using (9.120), formulate an expression for the channel capacity of wireless diversity 
using the selection combiner. 

For the special case of a MIMO system having N t = N r = N , show that the ergodic capacity of the 
system scales linearly, rather than logarithmically, with increasing SNR as N approaches infinity. 

In this problem we continue with the solution to Problem 9.28, namely 

aS7V ^ C ° 

where N t = N T = N and is the average eigenvalue of the matrix produced HH + = H^H . What is 
the value of the constant? 

Justify the asymptotic result given in (9.119); that is, 

C 

— > constant 
N 

What conclusion can you draw from this asymptotic result? 

Suppose that an additive, temporally stationary, Gaussian interference v(f) corrupts the basic 
complex channel model of (9.105). The interference v(f) has zero mean and correlation matrix R,,. 
Evaluate the effect of the interference v(f) on the ergodic capacity of the MIMO link. 

Consider a MIMO link for which the channel may be considered to be essentially “constant for k 
users of the channel.” 

Starting with the basic channel model of (9.105), formulate the input-output relationship of this 
link with the input being described by the N r -by-k matrix 

S = [s p So, ■ • s k ] 

How is the log-det capacity formula of the link correspondingly modified? 

In a MIMO channel, the ability to exploit space-division multiple-access techniques for spectrally 
efficient wireless communications is determined by the rank of the complex channel matrix H. (The 
rank of a matrix is defined by the number of independent columns in the matrix.) For a given (N t , (V r ) 
antenna configuration, it is desirable that the rank of H equal the minimum one of N t transmit and N r 
receive antennas, for it is only then that we are able to exploit the full potential of the MIMO antenna 
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configuration. Under special conditions, however, the rank of the channel matrix H is reduced to 
unity, in which case the scattering (fading) energy flow across the MIMO link is effectively confined 
to a very narrow pipe, and with it, the channel capacity is severely degraded. 

Under the special conditions just described, a physical phenomenon known as the keyhole channel or 
pinhole channel is known to arise. Using a propagation layout of the MIMO link, describe how this 
phenomenon can be explained. 

OFDMA and CDMA 

Parts a and b of Figure 9.31 show the block diagrams of the transmitter and receiver of an OFDM 
system, formulated on the basis of digital signal processing. It is informative to construct an analog 
interpretation of the OFDM system, which is the objective of this problem. 

Construct the analog interpretations of parts a and b in Figure 9.31. 

With this construction at hand, compare the advantages and disadvantages of the digital and 
analog implementations of OFDM. 

Figure P9.34 depicts the model of a DS/BPSK system, where the order of spectrum spreading and 
BPSK in the actual system has been interchanged; this is feasible because both operations are linear. 
For system analysis, we build on signal-space theoretic ideas of Chapter 7, using this model and 
assuming the presence of a jammer at the receiver input. Thus, whereas signal-space representation of 
the transmitted signal, x(t), is one-dimensional, that of the jammer, j(t), is two-dimensional. 

Derive the processing gain formula of (9.125). 

Next, ignoring the benefit gained from coherent detection, derive the SNR formula of (9.126). 



Notes 


1. Local propagation effects are discussed in Chapter 1 of the classic book by Jakes (1974). For a 
comprehensive treatment of this subject, see the books by Parsons (2000) and Molisch (2011). 

2. Bessel functions are discussed in Appendix C. 

3. To be precise, we should use the terminology “autocorrelation” function rather then “correlation” 
function as we did in Section 9.3. However, to be consistent with the literature, hereafter we use the 
terminology “correlation function” for the sake of simplicity. 

4. On the basis of many measurements, the power-delay profile may be approximated by the one- 
sided exponential functions (Molisch, 2011): 

f= exp(-r/<r ), for r>0 

p~At)\ t 

11 1= 0, otherwise 

For a more generic model, the power-delay profile is viewed as the sum of several one-sided 
exponential functions representing multiple clusters of interacting objects, as shown by 
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T, l 


where Pj, r ., and cr T ■ are respectively the power, delay, and delay spread of the ith cluster. 

5. The approximate approach described in Section 9.5 follows Van Trees (1971). 

6. The complex tap-coefficient c n (t) is also referred to as the tap-gain or tap-weight. 

7. The chi-squared distribution with two degrees of freedom is described in Appendix A. 

8. The term “maximal-ratio combiner” was coined in a classic paper on linear diversity combining 
techniques by Brennan (1959). 

9. The three-point exposition presented in this section on maximal-ratio combining follows the 
chapter by Stein in Schwartz et al. (1966: 653-654). 

10. The idea of MIMO for wireless communications was first described in the literature by Foschini 
(1996). In the same year. Teletar (1996) derived the capacity of multi-antenna Gaussian channels in 
a technical report. 

11. As a result of experimental measurements, the model is known to be decidedly non-Gaussian 
owing to the impulsive nature of human-made electromagnetic interference and natural noise. 

12. Detailed derivation of the ergodic capacity in (9.1 15) is presented in Appendix E. 

13. The idea of OFDM has a long history, dating back to Chang (1966). Then, Weinstein and Ebert 
(1971) used the FFT algorithm and guard intervals for the first digital implementation of OFDM. 
The first use of OFDM for mobile communications is credited to Cemini (1985). 

In the meantime, OFDM has developed into an indispensable tool for broadband wireless 
communications and digital audio broadcasting. 

14. The literature on spread spectrum communications is enormous. For classic papers on spread 
spectrum communications, see the following two: 

• The paper by Scholtz (1982) describes the origins of spread spectrum communications. 

• The paper by Pickholtz, et al. (1982) addresses the fundamentals of spread spectrum 
communications. 

15. The Walsh-Hadamard sequences (codes) are named in honor of two pioneering contributions: 

• Joseph L. Walsh (1923) for finding a new set of orthogonal functions with entries ±1 . 

• Jacques Hadamard (1893) for finding a new set of square matrices also with entries ±1 , which 
had all their rows (and columns) orthogonal. 

For more detailed treatments of these two papers, see Harmuth (1970), and Seberry and Yamada 
(1992), respectively. 

16. To be rigorous mathematically, we should speak of the matrices A and B to be over the Galois 
field, GF(2). To explain, for any prime p, there exists a finite field of p elements, denoted by GFfP). 
For any positive integer b, we may expand the finite field GF(/r) to a field of p l} elements, which is 
called an extension field of GF(p) and denoted by GF(p^). Finite fields are also called Galois fields in 
honor of their discoverer. 

Thus, for the example of (9.129), we have a Galois field of p = 2 and thus write GF(2). 
Correspondingly, for the H 4 in (9.130) we have the Galois field GF(2 2 ) = GF(4) 

17. The original papers on Gold sequences are Gold (1967, 1968). A detailed discussion of Gold 
sequences is presented in Holmes (1982). 

18. The classic paper on the RAKE receiver is due to Price and Green (1958). For a good treatment 
of the RAKE receiver, more detailed than that presented in Section 9.15, see Chapter 5 in the book 
by Haykin and Mohr (2005). For application of the RAKE receiver in CDMA, see the book by 
Viterbi (1995). 
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Introduction 


In the previous three chapters we studied the important issue of data transmission over 
communication channels under three different channel-impairment scenarios: 

• In Chapter 7 the focus of attention was on the kind of channels where AWGN is the 
main source of channel impairment. An example of this first scenario is a satellite- 
communication channel. 

• In Chapter 8 the focus of attention was intersymbol interference as the main source 
of channel impairment. An example of this second scenario is the telephone 
channel. 

• Then, in Chapter 9, we focused on multipath as a source of channel impairment. An 
example for this third scenario is the wireless channel. 

Although, indeed, these three scenarios are naturally quite different from each other, they 
do share a common practical shortcoming: reliability. This is where the need for error- 
control coding, the topic of this chapter, assumes paramount importance. 

Given these physical realities, the task facing the designer of a digital communication 
system is that of providing a cost-effective facility for transmitting information from one 
end of the system at a rate and level of reliability and quality that are acceptable to a user 
at the other end. 

From a communication theoretic perspective, the key system parameters available for 
achieving these practical requirements are limited to two: 

• transmitted signal power, and 

• channel bandwidth. 

These two parameters, together with the power spectral density of receiver noise, 
determine the signal energy per bit-to-noise power spectral density ratio, E^/Nq. In 
Chapter 7 we showed that this ratio uniquely determines the BER produced by a particular 
modulation scheme operating over a Gaussian noise channel. Practical considerations 
usually place a limit on the value that we can assign to E^/Nq. To be specific, in practice, 
we often arrive at a modulation scheme and find that it is not possible to provide 
acceptable data quality (i.e., low enough error performance). For a fixed E^/Nq, the only 
practical option available for changing data quality from problematic to acceptable is to 
use error-control coding, which is the focus of attention in this chapter. In simple terms, 
by incorporating a fixed number of redundant bits into the structure of a codeword at the 
transmitter, it is feasible to provide reliable communication over a noisy channel, provided 
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that Shannon’s code theorem, discussed in Chapter 5, is satisfied. In effect, channel 
bandwidth is traded off for reliable communication. 

Another practical motivation for the use of coding is to reduce the required E b /N 0 for a 
fixed BER. This reduction in E b /N 0 may, in turn, be exploited to reduce the required 
transmitted power or reduce the hardware costs by requiring a smaller antenna size in the 
case of radio communications. 

Error Control Using Forward Error Correction 


Error control for data integrity may be exercised by means of forward error correction 
(FEC). Figure 10.1a shows the model of a digital communication system using such an 
approach. The discrete source generates information in the form of binary symbols. The 
channel encoder in the transmitter accepts message bits and adds redundancy according to 
a prescribed rule, thereby producing an encoded data stream at a higher bit rate. The 
channel decoder in the receiver exploits the redundancy to decide which message bits in 
the original data stream, given a noisy version of the encoded data stream, were actually 
transmitted. The combined goal of the channel encoder and decoder is to minimize the 
effect of channel noise. That is, the number of errors between the channel encoder input 
(derived from the source) and the channel decoder output (delivered to the user) is 
minimized. 

For a fixed modulation scheme, the addition of redundancy in the coded messages 
implies the need for increased transmission bandwidth. Moreover, the use of error-control 
coding adds complexity to the system. Thus, the design trade-offs in the use of error-control 
coding to achieve acceptable error performance include considerations of bandwidth and 
system complexity. 


Discrete channel 



Noise 

(a) 



Noise 

(b) 


Simplified models of a digital communication system, (a) Coding and modulation 
performed separately, (b) Coding and modulation combined. 
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There are many different error-correcting codes (with roots in diverse mathematical 
disciplines) that we can use. Historically, these codes have been classified into block codes 
and convolutional codes. The distinguishing feature for this particular classification is the 
presence or absence of memory in the encoders for the two codes. 

To generate an ( n,k ) block code, the channel encoder accepts information in successive 
k-bit blocks', for each block, it adds n - k redundant bits that are algebraically related to 
the k message bits, thereby producing an overall encoded block of n bits, where n > k. The 
n-bit block is called a codeword, and n is called the block length of the code. The channel 
encoder produces bits at the rate Rq = ( n/k)R s , where R s is the bit rate of the information 
source. The dimensionless ratio r = kJn is called the code rate, where 0 < r < 1. The bit 
rate R 0 , coming out of the encoder, is called the channel data rate. Thus, the code rate is a 
dimensionless ratio, whereas the data rate produced by the source and the channel data 
rate produced by the encoder are both measured in bits per second. 

In a convolutional code, the encoding operation may be viewed as the discrete-time 
convolution of the input sequence with the impulse response of the encoder. The duration 
of the impulse response equals the memory of the encoder. Accordingly, the encoder for a 
convolutional code operates on the incoming message sequence, using a “sliding window” 
equal in duration to its own memory. This, in turn, means that in a convolutional code, 
unlike in a block code, the channel encoder accepts message bits as a continuous sequence 
and thereby generates a continuous sequence of encoded bits at a higher rate. 

In the model depicted in Figure 10.1a, the operations of channel coding and modulation 
are performed separately in the transmitter; and likewise for the operations of detection 
and decoding in the receiver. When, however, bandwidth efficiency is of major concern, 
the most effective method of implementing forward error-control correction coding is to 
combine it with modulation as a single function, as shown in Figure 10.1b. In this second 
approach, coding is redefined as a process of imposing certain patterns on the transmitted 
signal and the resulting code is called a trellis code. 

Block codes, convolutional codes, and trellis codes represent the classical family of 
codes that follow traditional approaches rooted in algebraic mathematics in one form or 
another. In addition to these classical codes, we now have a “new” generation of coding 
techniques exemplified by turbo codes and low-density parity-check ( LDPC ) codes. These 
new codes are not only fundamentally different, but they have also already taken over the 
legacy coding schemes very quickly in many practical systems. Simply put, turbo codes 
and LDPC codes are structured in such a way that decoding can be split into a number of 
manageable steps, thereby making it possible to construct powerful codes in a 
computationally feasible manner, which is not attainable with the legacy codes. Turbo 
codes and LDPC codes are discussed in the latter part of the chapter. 

Discrete Memoryless Channels 


Returning to the model of Figure 10.1a, the waveform channel is said to be memoryless if 
in a given interval the detector output depends only on the signal transmitted in that 
interval and not on any previous transmission. Under this condition, we may model the 
combination of the modulator, the waveform channel, and the demodulator (detector) as a 
discrete memoryless channel. Such a channel is completely described by the set of 
transition probabilities denoted by p(j\i), where i denotes a modulator input symbol, j 
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denotes a demodulator output symbol, and p(j\i) is the probability of receiving symbol j 
given that symbol i was sent. (Discrete memoryless channels were described previously at 
some length in Chapter 5 on information theory.) 

The simplest discrete memoryless channel results from the use of binary input and 
binary output symbols. When binary coding is used, the modulator has only the binary 
symbols 0 and 1 as inputs. Likewise, the decoder has only binary inputs if binary 
quantization of the demodulator output is used; that is, a hard decision is made on the 
demodulator output as to which binary symbol was actually transmitted. In this situation, 
we have a binary symmetric channel with a transition probability diagram as shown in 
Figure 10.2. From Chapter 5, we recall that the binary symmetric channel, assuming a 
channel noise modeled as AWGN, is completely described by the transition probability. 
Hard-decision decoding takes advantage of the special algebraic structure that is built into 
the design of channel codes; the decoding is therefore relatively easy to perform. 

However, the use of hard decisions prior to decoding causes an irreversible loss of 
valuable information in the receiver. To reduce this loss, soft-decision coding can be used. 
This is achieved by including a multilevel quantizer at the demodulator output, as 
illustrated in Figure 10.3a for the case of binary PSK signals. The input-output 
characteristic of the quantizer is shown in Figure 10.3b. The modulator has only binary 
symbols 0 and 1 as inputs, but the demodulator output now has an alphabet with Q 
symbols. Assuming the use of the three-level quantizer described in Figure 10.3b, we have 
<2 = 8. Such a channel is called a binary input, Q-ary output discrete memoryless channel. 
The corresponding channel transition probability diagram is shown in Figure 10.3c. The 
form of this distribution, and consequently the decoder performance, depends on the 
location of the representation levels of the quantizer, which, in turn, depends on the signal 
level and noise variance. Accordingly, the demodulator must incorporate automatic gain 
control if an effective multilevel quantizer is to be realized. Moreover, the use of soft 
decisions complicates the implementation of the decoder. Nevertheless, soft-decision 
decoding offers significant improvement in performance over hard-decision decoding by 
taking a probabilistic rather than an algebraic approach. It is for this reason that soft- 
decision decoders are also referred to as probabilistic decoders. 


In Chapter 5 on information theory we established the concept of channel capacity, which, 
for a discrete memoryless channel, represents the maximum amount of information that 
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(a) 



Binary input, (2-ary output discrete memoryless channel, (a) Receiver for binary PSK. 
(b) Transfer characteristic of a multilevel quantizer, (c) Channel transition probability diagram. Parts 
(b) and (c) are illustrated for eight levels of quantization. 


can be transmitted per channel use in a reliable manner. The channel coding theorem 
states: 


For the special case of a binary symmetric channel, the theorem teaches us that if the code 
rate r is less than the channel capacity C, then it is possible to find a code that achieves 
error-free transmission over the channel. Conversely, it is not possible to find such a code 
if the code rate r is greater than the channel capacity C. Thus, the channel coding theorem 
specifies the channel capacity C as a fundamental limit on the rate at which the 
transmission of reliable (error-free) messages can take place over a discrete memoryless 
channel. The issue that matters here is not the SNR, so long as it is large enough, but how 
the channel input is encoded. 

The most unsatisfactory feature of the channel coding theorem, however, is its 
nonconstructive nature. The theorem asserts the existence of good codes but does not tell 
us how to find them. By good codes we mean families of channel codes that are capable of 
providing reliable transmission of information (i.e., at arbitrarily small probability of 
symbol error) over a noisy channel of interest at bit rates up to a maximum value less than 
the capacity of that channel. The error-control coding techniques described in this chapter 
provide different methods of designing good codes. 
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Many of the codes described in this chapter are binary codes, for which the alphabet 
consists only of binary symbols 0 and 1. In such a code, the encoding and decoding 
functions involve the binary arithmetic operations of modulo-2 addition and multiplication 
performed on codewords in the code. 

Throughout this chapter, we use the ordinary plus sign (+) to denote modulo-2 addition. 
The use of this terminology will not lead to confusion because the whole chapter relies on 
binary arithmetic. In so doing, we avoid use of the special symbol ©, as we did in previous 
parts of the book. Thus, according to the notation used in this chapter, the rules for 
modulo-2 addition are as follows: 

0 + 0 = 0 
1 + 0=1 
0+1 = 1 
1 + 1=0 

Because 1 + 1 = 0, it follows that 1 = -1. Hence, in binary arithmetic, subtraction is the 
same as addition. The rules for modulo-2 multiplication are as follows: 

0x0 = 0 
1x0 = 0 
0x1=0 
1x1 = 1 

Division is trivial, in that we have 

1 + 1 = 1 
0 + 1=0 

and division by 0 is not permitted. Modulo-2 addition is the EXCLUSIVE-OR operation 
in logic and modulo-2 multiplication is the AND operation. 

Linear Block Codes 


By definition: 


Consider, then, an (n,k) linear block code, in which k bits of the n code bits are always 
identical to the message sequence to be transmitted. The (n - k ) bits in the remaining 
portion are computed from the message bits in accordance with a prescribed encoding rule 
that determines the mathematical structure of the code. Accordingly, these (n - k ) bits are 
referred to as parity-check bits. Block codes in which the message bits are transmitted in 
unaltered form are called systematic codes. For applications requiring both error detection 
and error correction, the use of systematic block codes simplifies implementation of the 
decoder. 

Let niQ, /«[, ..., nif. __ | constitute a block of k arbitrary message bits. Thus, we have 2 k 
distinct message blocks. Let this sequence of message bits be applied to a linear block 
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encoder, producing an n-bit codeword whose elements are denoted by Cq, c | , . c n _ Let 
b 0 , b p ..., b n _ k _ i denote the (n - k) parity-check bits in the codeword. For the code to 
possess a systematic structure, a codeword is divided into two parts, one of which is 
occupied by the message bits and the other by the parity-check bits. Clearly, we have the 
option of sending the message bits of a codeword before the parity-check bits, or vice versa. 
The former option is illustrated in Figure 10.4, and its use is assumed in the following. 

According to the representation of Figure 10.4, the (n - k) leftmost bits of a codeword 
are identical to the corresponding parity-check bits and the k rightmost bits of the 
codeword are identical to the corresponding message bits. We may therefore write 


C; = 


b t , i = 0, 1, n-k - 1 

m i+k -n' i = n ~ k> n - k + 1, . . n - 1 


The (n - k ) parity-check bits are linear sums of the k message bits, as shown by the 
generalized relation 

b i = P0i m 0 + Pli m l + ••• +Pk-\,i m k-\ 

where the coefficients are defined as follows: 

1 if b- depends on m ■ 

Pij = ' 1 

l 0 otherwise 

The coefficients p, j are chosen in such a way that the rows of the generator matrix are 
linearly independent and the parity-check equations are unique. The pjj used here should 
not be confused with the p(j\i) introduced in Section 10.3. 

The system of (10.1) and (10.2) defines the mathematical structure of the (n,k) linear 
block code. This system of equations may be rewritten in a compact form using matrix 
notation. To proceed with this reformulation, we respectively define the 1-by-A: message 
vector m, the l-by-(« - k) parity-check vector b, and the l-by-n code vector c as follows: 

m = [m 0 , m p ..., m k _ j] 

b = [b 0 ,b v ...,b n _ k _ j] 

c = [c 0 ,c v 

Note that all three vectors are row vectors. The use of row vectors is adopted in this 
chapter for the sake of being consistent with the notation commonly used in the coding 
literature. We may thus rewrite the set of simultaneous equations defining the parity 
check-bits in the compact matrix form 

b = mP 


b 0l b lt . 

> 

1 

1 

m 0 , m h . 

■, m k- 1 

v 


Parity-check bits Message bits 


Structure of systematic codeword. 
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The P in (10.7) is the k-by-{n - k) coefficient matrix defined by 


POO P 01 """" P0, n-k-\ 

P 10 P 11 Pi, n-k-1 

Pk- 1,0 Pk- 1 , 1 Pk-1, n-k-l 


where the element p-^ is 0 or 1 . 

From the definitions given in (10.4)— (10.6), we see that c may be expressed as a 
partitioned row vector in terms of the vectors m and b as follows: 


c - [b m] 


Hence, substituting (10.7) into (10.9) and factoring out the common message vector m, we 
get 

c= m[ P j ij 

where I k is the A:-by-A: identity matrix : 


h = 


1 0 
0 1 


0 0 


Define the k-by-n generator matrix 


0 

0 

1 


G = 



The generator matrix G of (10.12) is said to be in the canonical form, in that its k rows are 
linearly independent; that is, it is not possible to express any row of the matrix G as a 
linear combination of the remaining rows. Using the definition of the generator matrix G, 
we may simplify (10.10) as 

c = mG 


The full set of codewords, referred to simply as the code, is generated in accordance 
with (10.13) by passing the message vector m range through the set of all 2 k binary 
^-tuples (1-by-A: vectors). Moreover, the sum of any two codewords in the code is another 
codeword. This basic property of linear block codes is called closure. To prove its validity, 
consider a pair of code vectors c ; - and Cy corresponding to a pair of message vectors m ( and 
my, respectively. Using (10.13), we may express the sum of c ; - and Cy as 

c + c. = m G + m G 

1 J 1 J 

= (m ; + my)G 

The modulo-2 sum of m, and m ; represents a new message vector. Correspondingly, the 
modulo-2 sum of c, and Cy represents a new code vector. 

There is another way of expressing the relationship between the message bits and 
parity-check bits of a linear block code. Let H denote an in - k)-by-n matrix, defined as 
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where P T is an (n - k)-by-k matrix, representing the transpose of the coefficient matrix P, 
and I„_£ is the (n - k)-by-(n - k) identity matrix. Accordingly, we may perform the 
following multiplication of partitioned matrices: 


T 

HG = 


K-k ! p 


p 

i, 


T T 

P +P 


where we have used the fact that multiplication of a rectangular matrix by an identity 
matrix of compatible dimensions leaves the matrix unchanged. In modulo-2 arithmetic, 

t r r 

the matrix sum P + P is 0. We therefore have 


hg t = 0 

Equivalently, we have GH T = 0, where 0 is a new null matrix. Postmultiplying both sides 
of (10.13) by H , the transpose of H, and then using (10.15), we get the inner product 

T T 

cH = mGH 

= 0 

The matrix H is called the parity-check matrix of the code and the equations specified by 
(10.16) are called parity-check equations. 

The generator equation (10.13) and the parity-check detector equation (10.16) are basic 
to the description and operation of a linear block code. These two equations are depicted 
in the form of block diagrams in Figure 10.5a and b, respectively. 


The generator matrix G is used in the encoding operation at the transmitter. On the other 
hand, the parity-check matrix H is used in the decoding operation at the receiver. In the 
context of the latter operation, let r denote the 1-by-n received vector that results from 
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(b) 

Block diagram representations of the generator 
equation (10.13) and the parity-check equation (10.16). 
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sending the code vector c over a noisy binary channel. We express the vector r as the sum 
of the original code vector c and a new vector e, as shown by 

r = c + e 

The vector e is called the error vector or error pattern. The z'th element of e equals 0 if the 
corresponding element of r is the same as that of c. On the other hand, the z'th element of e 
equals 1 if the corresponding element of r is different from that of c, in which case an 
error is said to have occurred in the z'th location. That is, for i = 1, 2,..., n, we have 

f 1 if an error has occurred in the z'th location 

e i = \ 

[0 otherwise 

The receiver has the task of decoding the code vector c from the received vector r. The 
algorithm commonly used to perform this decoding operation starts with the computation 
of a 1 -by-(/7 - k) vector called the error- syndrome vector or simply the syndrome. The 
importance of the syndrome lies in the fact that it depends only upon the error pattern. 
Given a 1-by-zz received vector r, the corresponding syndrome is formally defined as 

T 

s = rH 

Accordingly, the syndrome has the following important properties. 

The syndrome depends only on the error pattern and not on the transmitted codeword. 

To prove this property, we first use (10.17) and (10.19), and then (10.16) to write 

s = 


Hence, the parity-check matrix H of a code permits us to compute the syndrome s, which 
depends only upon the error pattern e. 

To expand on Property 1 , suppose that the error pattern e contains a pair of errors in 
locations i and j caused by the additive channel noise, as shown by 

e = [0. . .01,0... 01,0. ..0] 

1 J 

Then, substituting this error pattern into (10.20) yields the syndrome 

s = h / + h. 

where h, and h,- are respectively the z'th and /th rows of the matrix H . In words, we may 
state the following corollary to Property 1 : 


(c + e)H T 

T T 

cH + eH 


All error patterns that differ by a codeword have the same syndrome. 

For k message bits, there are 2 k distinct code vectors denoted as c,-, where 1 = 0, 1 , ...,2 k — 1 . 
Correspondingly, for any error pattern e we define the 2 k distinct vectors e ; as follows 

e ( - = e + c- for z = 0, 1, . . ., 2 k - 1 
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The set of vectors (e,-, i = 0, 1, .... 2* — 1) defined in (10.21) is called a coset of the code. In 
other words, a coset has exactly 2 k elements that differ at most by a code vector. Thus, an 
(. n,k ) linear block code has 2" ~ k possible cosets. In any event, multiplying both sides of 
(10.21) by the matrix H T and again using (10.16), we get 

T T T 

e-H = eH + c-H 


which is independent of the index i. Accordingly, we may say: 


We may put Properties 1 and 2 in perspective by expanding (10.20). Specifically, with the 
matrix H having the systematic form given in (10.14), where the matrix P is itself defined 
by (10.8), we find from (10.20) that the (n - k) elements of the syndrome s are linear 
combinations of the n elements of the error pattern e, as shown by 

s 0 = e 0 + e n-kP00 + e n-k+lP\0 + +e n-kPk- 1,0 

s l= e l +e n-kP01 +e n-k + lPn+ ■ +e n-kPk-l,l 

s n - k- 1 — e n - k- 1 e n - kP 0, n-k - 1 e n - 1 P(k- 1 ,n-k+ 1) 

This set of (n - k) linear equations clearly shows that the syndrome contains information 
about the error pattern and may, therefore, be used for error detection. However, it should 
be noted that the set of equations (10.23) is underdetermined, in that we have more 
unknowns than equations. Accordingly, there is no unique solution for the error pattern. 
Rather, there are 2” error patterns that satisfy (10.23) and, therefore, result in the same 
syndrome, in accordance with Property 2 and (10.22). In particular, with 2 n ~ k possible 
syndrome vectors, the information contained in the syndrome s about the error pattern e is 
not enough for the decoder to compute the exact value of the transmitted code vector. 
Nevertheless, knowledge of the syndrome s reduces the search for the true error pattern e 
from 2" to 2 n ~ k possibilities. Given these possibilities, the decoder has the task of making 
the best selection from the cosets corresponding to s. 


Consider a pair of code vectors C| and C 2 that have the same number of elements. The 
Hamming distance, denoted by <f(Cj,C 2 ), between such a pair of code vectors is defined as 
the number of locations in which their respective elements differ. 

The Hamming weight w(c) of a code vector c is defined as the number of nonzero 
elements in the code vector. Equivalently, we may state that the Hamming weight of a 
code vector is the distance between the code vector and the all-zero code vector. In a 
corresponding way, we may introduce a new parameter called the minimum distance c/ m j n , 
for which we make the statement: 
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That is, the minimum distance is the same as the smallest Hamming weight of the difference 
between any pair of code vectors. From the closure property of linear block codes, the sum 
(or difference) of two code vectors is another code vector. Accordingly, we may also state: 


The minimum distance r/ mm is related to the structure of the parity-check matrix H of the 
code in a fundamental way. From (10.16) we know that a linear block code is defined by 
the set of all code vectors for which cH T = 0, where H T is the transpose of the parity-check 
matrix H. Let the matrix H be expressed in terms of its columns as shown by 

H = [h„h 2 , ...,hj 

Then, for a code vector c to satisfy the condition cH T = 0, the vector c must have ones in 
such positions that the corresponding rows of H T sum to the zero vector 0. However, by 
definition, the number of ones in a code vector is the Hamming weight of the code vector. 
Moreover, the smallest Hamming weight of the nonzero code vectors in a linear block 
code equals the minimum distance of the code. Hence, we have another useful result stated 
as follows: 


From this discussion, it is apparent that the minimum distance c/ mm of a linear block code 
is an important parameter of the code. Specifically, c/ nlin determines the error-correcting 
capability of the code. Suppose an in.k) linear block code is required to detect and correct 
all error patterns over a binary symmetric channel, and whose Hamming weight is less 
than or equal to t. That is, if a code vector c,- in the code is transmitted and the received 
vector is r = c ,• + e, we require that the decoder output c = c ; - whenever the error pattern e 
has a Hamming weight 

w(e) < t 

We assume that the 2 k code vectors in the code are transmitted with equal probability. The 
best strategy for the decoder then is to pick the code vector closest to the received vector r; 
that is, the one for which the Hamming distance d(c n r) is the smallest. With such a 
strategy, the decoder will be able to detect and correct all error patterns of Hamming weight 
w(e), provided that the minimum distance of the code is equal to or greater than 2t + 1. We 
may demonstrate the validity of this requirement by adopting a geometric interpretation of 
the problem. In particular, the transmitted 1-by-n code vector and the 1-by-n received 
vector are represented as points in an n-dimensional space. Suppose that we construct two 
spheres, each of radius t, around the points that represent code vectors c ; - and c / under two 
different conditions: 

Let these two spheres be disjoint, as depicted in Figure 10.6a. For this condition to 
be satisfied, we require that d(c h Cj) >2 1 + 1. If, then, the code vector c ( - is transmitted 
and the Hamming distance c/(c,-,r) < t, it is clear that the decoder will pick c as it is 
the code vector closest to the received vector r. 

If, on the other hand, the Hamming distance d(Cj,Cj) 5 2 1, the two spheres around c ,• 
and c j intersect, as depicted in Figure 10.6b. In this second situation, we see that if c ,• 
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(a) Hamming distance d(Cj,Cj) ^ 2f + 1 . (b) Hamming distance 
d(Cj,Cj) < 2 1. The received vector is denoted by r. 


is transmitted, there exists a received vector r such that the Hamming distance 
t/(c,,r) < 1 , yet r is as close to c ; as it is to c Clearly, there is now the possibility of 
the decoder picking the vector Cj, which is wrong. 

We thus conclude the ideas presented thus far by saying: 


By definition, however, the smallest distance between any pair of code vectors in a code is 
the minimum distance c/ ITlm of the code. We may, therefore, go on to state: 


The condition described in (10.25) is important because it gives the error-correcting 
capability of a linear block code a quantitative meaning. 


We are now ready to describe a syndrome-based decoding scheme for linear block codes. 
Let Cj, c 2 , ..., c 2 * denote the 2 k code vectors of an (n, k ) linear block code. Let r denote 
the received vector, which may have one of 2" possible values. The receiver has the task of 
partitioning the 2" possible received vectors into 2 k disjoint subsets /J f . D 7 . in 

such a way that the /th subset D, corresponds to code vector c, for 1 < i < 2 k . The received 
vector r is decoded into c ; - if it is in the /th subset. For the decoding to be correct, r must be 
in the subset that belongs to the code vector c, that was actually sent. 

The 2 k subsets described herein constitute a standard array of the linear block code. To 
construct it, we exploit the linear structure of the code by proceeding as follows: 

The 2 k code vectors are placed in a row with the all-zero code vector C| as the 
leftmost element. 

An error pattern e 2 is picked and placed under Cj, and a second row is formed by 
adding e 2 to each of the remaining code vectors in the first row; it is important that 
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<=1 = 0 c 2 c 3 

e 2 c 2 + e 2 c 3 + e 2 

e 3 c 2 + e 3 c 3 + e 3 

<v <=2 + <7 C 3 + 

e 2 « - k C 2 + e 2 n - * C 3 + e 2 n - A: 

Standard array f 


Cj ■ ■ ■ c 2 i 
c ; + e 2 ... c 2 k + e 2 

c ; + e 3 ... c 2* + e 3 

c ; + <7 ... C 2 t + e,- 

c ( + e 2 « k c 2 k + c 2 » - 2 

an (n,k) block code. 


the error pattern chosen as the first element in a row has not previously appeared in 
the standard array. 

Step 2 is repeated until all the possible error patterns have been accounted for. 

Figure 10.7 illustrates the structure of the standard array so constructed. The 2 k columns of 
this array represent the disjoint subsets D D 0 , ...,D 2 k . The 2"~ / ' rows of the array 
represent the cosets of the code, and their first elements e 2 , ..., e 2 n -* are called coset 
leaders. 

For a given channel, the probability of decoding error is minimized when the most 
likely error patterns (i.e., those with the largest probability of occurrence) are chosen as 
the coset leaders. In the case of a binary symmetric channel, the smaller we make the 
Flamming weight of an error pattern, the more likely it is for an error to occur. 
Accordingly, the standard array should be constructed with each coset leader having the 
minimum Hamming weight in its coset. 

We are now ready to describe a decoding procedure for linear block codes: 


This procedure is called syndrome decoding. 

Hamming Codes 

For any positive integer m > 3, there exists a linear block code with the following 
parameters: 

code length n = 2 m - 1 

number of message bits k = 2 m - m - 1 

number of parity-check bits n-k = m 

Such a linear block code for which the error-correcting capability t = 1 is called a 
Hamming code. To be specific, consider the example of m = 3, yielding the (7,4) 
Hamming code with n = 7 and k = 4. The generator of this code is defined by 
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G = 


1 1 0 
0 1 1 
1 1 1 
1 0 1 

p 


10 0 0 
0 10 0 
0 0 10 
0 0 0 1 


which conforms to the systematic structure of (10.12). 
The corresponding parity-check matrix is given by 


H = 


100 | 1011 
0 10 | 1110 
001 I 0111 



The operative property embodied in this equation is that the columns of the parity-check 
matrix P consist of all the nonzero m-tuples, where m = 3. 

With k = 4, there are 2 k = 1 6 distinct message words, which are listed in Table 10.1. For 
a given message word, the corresponding codeword is obtained by using (10.13). Thus, the 
application of this equation results in the 16 codewords listed in Table 10.1. 

In Table 10.1, we have also listed the Hamming weights of the individual codewords in 
the (7,4) Hamming code. Since the smallest of the Hamming weights for the nonzero 
codewords is 3, it follows that the minimum distance of the code is 3, which is what it 
should be by definition. Indeed, all Hamming codes have the property that the minimum 
distance r/ mm = 3, independent of the value assigned to the number of parity bits m. 

To illustrate the relation between the minimum distance r/ mm and the structure of the 
parity-check matrix H, consider the codeword 0110100. In matrix multiplication, defined 


Codewords of a (7,4) Hamming code 


0000 

0000000 

0 

0001 

1010001 

3 

0010 

1110010 

4 

0011 

0100011 

3 

0100 

0110100 

3 

0101 

1100101 

4 

0110 

1000110 

3 

0111 

0010111 

4 


1000 

1101000 

3 

1001 

0111001 

4 

1010 

0011010 

3 

1011 

1001011 

3 

1100 

1011100 

4 

1101 

0001101 

3 

1110 

0101110 

4 

mi 

1111111 

7 
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by (10.16), the nonzero elements of this codeword “sift” out the second, third, and fifth 
columns of the matrix H, yielding 


0 


0 


0 


0 

1 

+ 

0 

+ 

1 

= 

0 

0 


1 


1 


0 


We may perform similar calculations for the remaining 14 nonzero codewords. We thus 
find that the smallest number of columns in H that sums to zero is 3, reconfirming the 
defining condition r/ mm = 3. 

An important property of binary Hamming codes is that they satisfy the condition of 
(10.25) with the equality sign, assuming that t = 1. Thus, assuming single-error patterns, 
we may formulate the error patterns listed in the right-hand column of Table 10.2. The 
corresponding eight syndromes, listed in the left-hand column, are calculated in 
accordance with (10.20). The zero syndrome signifies no transmission errors. 

Suppose, for example, the code vector [1110010] is sent and the received vector is 
[ 1 100010] with an error in the third bit. Using (10.19), the syndrome is calculated to be 


s = [1100010] 


1 0 0 
0 1 0 
0 0 1 
1 1 0 
0 1 1 
1 1 1 
1 0 1 


= [o 0 l] 

From Table 10.2 the corresponding coset leader (i.e., error pattern with the highest 
probability of occurrence) is found to be [0010000], indicating correctly that the third bit 
of the received vector is erroneous. Thus, adding this error pattern to the received vector, 
in accordance with (10.26), yields the correct code vector actually sent. 


Decoding table for the (7,4) 
Hamming code defined in Table 10.1 


000 

0000000 

100 

1000000 

010 

0100000 

001 

0010000 

110 

0001000 

Oil 

0000100 

111 

0000010 

101 

0000001 
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Cyclic Codes 

Cyclic codes form a subclass of linear block codes. Indeed, many of the important linear 
block codes discovered to date are either cyclic codes or closely related to cyclic codes. 
An advantage of cyclic codes over most other types of codes is that they are easy to 
encode. Furthermore, cyclic codes possess a well-defined mathematical structure, which 
has led to the development of very efficient decoding schemes for them. 

A binary code is said to be a cyclic code if it exhibits two fundamental properties: 

Linearity Property 

The sum of any two codewords in the code is also a codeword. 

Cyclic Property 

Any cyclic shift of a codeword in the code is also a codeword. 

Property 1 restates the fact that a cyclic code is a linear block code (i.e., it can be described 
as a parity-check code). To restate Property 2 in mathematical terms, let the /(-tuple 
Cq, Cj, . . . , c n _ | denote a codeword of an (n, k ) linear block code. The code is a cyclic 
code if the n-tuples 


are all codewords in the code. 

To develop the algebraic properties of cyclic codes, we use the elements 
Cq, Cj, . . . , c n _ | of a codeword to define the code polynomial 


where X is an indeterminate. Naturally, for binary codes, the coefficients are Is and Os. 
Each power of X in the polynomial c(X) represents a one-bit shift in time. Hence, 
multiplication of the polynomial c(A) by X may be viewed as a shift to the right. The key 
question is: How do we make such a shift cyclic ? The answer to this question is addressed 
next. 

Let the code polynomial c(X) in (10.27) be multiplied by X\ yielding 


Recognizing, for example, that c . + c • = 0 in modulo-2 addition, we may 
manipulate the preceding equation into the following compact form: 



( c l> c 2’ c n - V c o) 


c(X) = c 0 + Cj A + c 2 X 2 + ■■■ +c n _ } X n 1 


x‘c(X) = c 0 x‘ + Cl X ,+ l 



.77-1 



x'c(X) = q(X)(x" + 1 ) + c (,) (A) 


where the polynomial q(X) is defined by 



i - 1 
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As for the polynomial c'^X in (10.28), it is recognized as the code polynomial of the 
codeword (c n _ ■, . . c n _ v c Q , Cy . . c ■ _ j) obtained by applying i cyclic shifts to the 
codeword c 0 , c j, c n _ t _ y c n _ ; -, c n _ y Moreover, from (10.28) we readily see that 
c^\x) is the remainder that results from dividing x'c(X) by (X n + 1) . We may thus 
formally state the cyclic property in polynomial notation as follows: 


The special form of polynomial multiplication described in (10.30) is referred to as 

n 

multiplication modulo X + 1. In effect, the multiplication is subject to the constraint 
X = 1, the application of which restores the polynomial X c(A) to order n— 1 for all 

Yl Yl 

i<n . Note that, in modulo-2 arithmetic, X + 1 has the same value as A - 1 . 


The polynomial X" + 1 and its factors play a major role in the generation of cyclic codes. 
Let g(A) be a polynomial of degree n-k that is a factor of A +1; as such, g(A) is the 
polynomial of least degree in the code. In general, g(A) may be expanded as follows: 

n-k- 1 

g(A) = 1+ X 8 < X ‘ + X " 

i = 1 

where the coefficient g- is equal to 0 or 1 for i = 1, ..., n -k - 1. According to this 
expansion, the polynomial g(A) has two terms with coefficient 1 separated by n-k - 1 
terms. The polynomial g(A) is called the generator polynomial of a cyclic code. A cyclic 
code is uniquely determined by the generator polynomial g(A) in that each code 
polynomial in the code can be expressed in the form of a polynomial product as follows: 

c(A) = a(A)g(A) 

where a(A) is a polynomial in A with degree k- 1. The c(A) so formed satisfies the 

n 

condition of (10.30) since g(A) is a factor of A +1. 

Suppose we are given the generator polynomial g(A) and the requirement is to encode 
the message sequence (/« 0 , niy ..., m k j) into an (n, k ) systematic cyclic code. That is, 
the message bits are transmitted in unaltered form, as shown by the following structure for 
a codeword (see Figure 10.4): 

( b 0 ,b v ...,b n _ k _ i , m 0 ,my ...,m k _ l ) 

n - k parity-check bits k message bits 

Let the message polynomial be defined by 

k-1 

m(A) = 7 «q + m j A + • • • + m k _ jA 

and let 

b(A) = b 0 + b 1 X+-+b n _ k _ l X n - k ~ 1 
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Then, according to ( 10. 1), we want the code polynomial to be in the form 

c(X) = b(X)+x"~*m(X) 


To this end, the use of (10.32) and (10.35) yields 

a(X)g(X) = b(X)+x'' _A m(X) 


Equivalently, invoking modulo-2 addition, we may also write 


x” *m(X) 
g(*) 


a(X) + 


b(X) 

g(*) 


Equation (10.36) states that the polynomial b(X) is the remainder left over after dividing 
x”-*m(X)byg(X). 

We may now summarize the steps involved in the encoding procedure for an in,k) 
cyclic code, assured of a systematic structure. Specifically, we proceed as follows: 


Premultiply the message polynomial m(X) by X n k . 


M 1C 

Divide X m(X) by the generator polynomial g(X), obtaining the remainder b(X). 
Add b(X) to X n ~ k m(X), obtaining the code polynomial c(X). 


An (n,k) cyclic code is uniquely specified by its generator polynomial g(X) of order (« - k). 
Such a code is also uniquely specified by another polynomial of degree k, which is called 
the parity-check polynomial, defined by 

k ~l . , 

h(X) = 1 + £ h i X ‘ + X 
i = 1 

where the coefficients /;■ are 0 or 1 . The parity-check polynomial h(X) has a form similar 
to the generator polynomial, in that there are two terms with coefficient 1, but separated by 
k - 1 terms. 

The generator polynomial g(X) is equivalent to the generator matrix G as a description 
of the code. Correspondingly, the parity-check polynomial h(X) is an equivalent 
representation of the parity-check matrix H. We thus find that the matrix relation HG T = 0 
presented in (10.15) for linear block codes corresponds to the relationship 

g(X)h(X) mod(X"+ 1) = 0 
Accordingly, we may make the statement: 


This statement provides the basis for selecting the generator or parity-check polynomial of 
a cyclic code. In particular, if g(X) is a polynomial of degree (n - k) and it is also a factor 
of X n + 1, then g(X) is the generator polynomial of an (n,k) cyclic code. Equivalently, if 
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h(X) is a polynomial of degree k and it is also a factor of X n + 1 , then h(X) is the parity- 
check polynomial of an (n,k) cyclic code. 

A final comment is in order. Any factor of X n + 1 with degree (n - k) can be used as a 
generator polynomial. The fact of the matter is that, for large values of n, the polynomial 
X" + 1 may have many factors of degree n - k. Some of these polynomial factors generate 
good cyclic codes, whereas some of them generate bad cyclic codes. The issue of how to 
select generator polynomials that produce good cyclic codes is very difficult to resolve. 
Indeed, coding theorists have expended much effort in the search for good cyclic codes. 


Given the generator polynomial g(X) of an (n,k) cyclic code, we may construct the 
generator matrix G of the code by noting that the k polynomials g(X), Xg(X), . . ., X k ~^g{X) 
span the code. Hence, the n-tuples corresponding to these polynomials may be used as 
rows of the k-by-n generator matrix G. 

However, the construction of the parity-check matrix H of the cyclic code from the 
parity-check polynomial h(X) requires special attention, as described here. Multiplying 
(10.39) by a(x) and then using (10.32), we obtain 


The polynomials c(X) and h(X) are themselves defined by (10.27) and (10.37) respectively, 
which means that their product on the left-hand side of (10.40) contains terms with powers 
extending up to n + k - 1. On the other hand, the polynomial a(X) has degree k - I or less, 

lc 1 n, 1 

the implication of which is that the powers of X , X , . . . , X do not appear in the 
polynomial on the right-hand side of (10.40). Thus, setting the coefficients of 
X , a , . . ., x" in the expansion of the product polynomial c(X)h(X) equal to zero, we 
obtain the following set of n-k equations: 


Comparing (10.41) with the corresponding relation (10.16), we may make the following 
important observation: 


This observation suggests that we define the reciprocal of the parity-check polynomial as 
follows: 


c(X)h(X) = a(X) + x"a(X) 


Y J c i h k +j -i = 0 for 0 <j < n - k- [ 


‘=j 




i = 1 
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which is also a factor of X n + 1. The n-tuples pertaining to the ( n-k ) polynomials 
X k h(X l ), X k + 1 h(X *), X n 1 h(X ' ) may now be used in rows of the (n-k) -by-n 
parity-check matrix H. 

In general, the generator matrix G and the parity-check matrix H constructed in the 
manner described here are not in their systematic forms. They can be put into their 
systematic forms by performing simple operations on their respective rows, as illustrated 
in Example 1 . 


Earlier we showed that the encoding procedure for an (n,k) cyclic code in systematic form 
involves three steps: 

• multiplication of the message polynomial m(X) by X n ~ k , 

• division of X" /l m(.Y) by the generator polynomial g(X) to obtain the remainder 
h(X), and 

• addition of h(X) to X n ~ k m(X) to form the desired code polynomial. 

These three steps can be implemented by means of the encoder shown in Figure 10.8, 
consisting of a linear feedback shift register with (n - k) stages. 

The boxes in Figure 10.8 represent flip-flops , or unit-delay elements. The flip-flop is a 
device that resides in one of two possible states denoted by 0 and 1. An external clock (not 
shown in Figure 10.8) controls the operation of all the flip-flops. Every time the clock ticks, 
the contents of the flip-flops (initially set to the state 0) are shifted out in the direction of the 
arrows. In addition to the flip-flops, the encoder of Figure 10.8 includes a second set of logic 
elements, namely adders, which compute the modulo-2 sums of their respective inputs. 
Finally, the multipliers multiply their respective inputs by the associated coefficients. In 
particular, if the coefficient g. = 1, the multiplier is just a direct “connection.” If, on the 
other hand, the coefficient g. = 0, the multiplier is “no connection.” 

The operation of the encoder shown in Figure 10.8 proceeds as follows: 

The gate is switched on. Hence, the k message bits are shifted into the channel. As 
soon as the k message bits have entered the shift register, the resulting (n - k) bits in 
the register form the parity-check bits. (Recall that the parity-check bits are the same 
as the coefficients of the remainder b(X).) 



Encoder for an ( n,k ) cyclic code. 
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The gate is switched off, thereby breaking the feedback connections. 
The contents of the shift register are read out into the channel. 


Suppose the codeword (c Q , c ] , ..., c j) is transmitted over a noisy channel, resulting in the 
received word r f) , r r n _ j. From Section 10.3, we recall that the first step in the decoding 
of a linear block code is to calculate the syndrome for the received word. If the syndrome is 
zero, there are no transmission errors in the received word. If, on the other hand, the syndrome 
is nonzero, the received word contains transmission errors that require correction. 

In the case of a cyclic code in systematic form, the syndrome can be calculated easily. 
Let the received vector be represented by a polynomial of degree n- 1 or less, as shown by 

r(X) = r Q + r l X+--- +r n _ x X n ~ 1 

Let q(X) denote the quotient and s(X) denote the remainder, which are the results of 
dividing r(X) by the generator polynomial g(A'). We may therefore express r(X) as follows: 

r(X) = q(X)g(X) + s(X) 

The remainder s(X) is a polynomial of degree n-k- 1 or less, which is the result of interest. 
It is called the syndrome polynomial because its coefficients make up the (/i-A:)-by-l 
syndrome s. 

Figure 10.9 shows a syndrome calculator that is identical to the encoder of Figure 10.8 
except for the fact that the received bits are fed into the n-k stages of the feedback shift 
register from the left. As soon as all the received bits have been shifted into the shift 
register, its contents define the syndrome s. 

The syndrome polynomial s(X) has the following useful properties that follow from the 
definition given in (10.43). 

The syndrome of a received word polynomial is also the syndrome of the corresponding 
error polynomial. 

Given that a cyclic code with polynomial c(X) is sent over a noisy channel, the received 
word polynomial is defined by 

r(X) = c(X) + e(X) 

where e(X) is the error polynomial. Equivalently, we may write 

e(X) = r(X) + c(X) 


Syndrome 
computer for 
(n, k ) cyclic code. 


Received 

bits 



Flip-flop Modulo-2 
adder 
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Hence, substituting (10.32) and (10.43) into the preceding equation, we get 

e(X) = u(X)g(X) + s(X) 

where the quotient is u(Z) = a( X) + q( A") . Equation (10.44) shows that s(Z) is also the 
syndrome of the error polynomial e(A'). The implication of this property is that when the 
syndrome polynomial s(X) is nonzero, the presence of transmission errors in the received 
vector is detected. 

Let s(X) be the syndrome of a received word polynomial r(X). Then, the syndrome of 
Xr(X), representing a cyclic shift ofr(X), is Xs(X). 

Applying a cyclic shift to both sides of (10.43), we get 

Xr(X) = Xq(X)g(X)+Xs(X) 

from which we readily see that Xs(X) is the remainder of the division of Xr(X) by g(X). 
Hence, the syndrome of Xr(X) is Xs(X) as stated. We may generalize this result by stating 
that if s(X) is the syndrome of r(X), then X l s(X) is the syndrome of X'r(X) . 

The syndrome polynomial s(X) is identical to the error polynomial e(X), assuming that the 
errors are confined to the ( n — k ) parity-check bits of the received word polynomial r(X). 
The assumption made here is another way of saying that the degree of the error 
polynomial e(X) is less than or equal to (n-k- 1 ) . Since the generator polynomial g(X) 
is of degree {n-k), by definition, it follows that (10.44) can only be satisfied if the 
quotient u(X) is zero. In other words, the error polynomial t(X) and the syndrome 
polynomial s(X) are one and the same. The implication of Property 3 is that, under the 
aforementioned conditions, error correction can be accomplished simply by adding the 
syndrome polynomial s(X) to the received vector r(X). 

Hamming Codes Revisited 

To illustrate the issues relating to the polynomial representation of cyclic codes, we 
consider the generation of a (7,4) cyclic code. With the block length n = 7, we start by 

' 7 

factorizing X + I into three irreducible polynomials : 

x 1 + 1 = (1 + X)(1 +X 2 + X 3 )(1 + X + X 3 ) 

By an “irreducible polynomial” we mean a polynomial that cannot be factored using only 

polynomials with coefficients from the binary field. An irreducible polynomial of degree 

m is said to be primitive if the smallest positive integer n for which the polynomial divides 
fl yyi 2 3 -. 

X + 1 is n = 2 -1. For the example at hand, the two polynomials ( 1 + X + X ) and 

3 

( 1 + X + X ) are primitive. Let us take 

g(X) = l+X + X 3 

as the generator polynomial, whose degree equals the number of parity-check bits. This 
means that the parity-check polynomial is given by 

h(X) = (1 +X)(1 +X 2 + X 3 ) 

2 4 

= l+X + X' + x 

whose degree equals the number of message bits k = 4. 
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Next, we illustrate the procedure for the construction of a codeword by using this 
generator polynomial to encode the message sequence 1001. The corresponding message 
vector is given by 

m(X) = 1 + X 3 

n-k 3 

Hence, multiplying m(X) by X = X , we get 

x”~ A m(X) = X 3 +X 6 

The second step is to divide x" ^m(X) by g(X), the details of which (for the example at 
hand) are given below: 


X 3 + X 

X 3 + X + 1 I¥ +X 3 

X 6 X 4 + X 3 
X 4 

X 4 + X 2 + X 

x 2 + x 

Note that in this long division we have treated subtraction the same as addition since we 
are operating in modulo-2 arithmetic. We may thus write 

l+x + x 3 l+x + x 3 

That is, the quotient a(X) and remainder b(X) are as follows, respectively: 

a(X) = X + X 3 
b(X) = X + X 2 

Hence, from (10.35) we find that the desired code vector is 

c(X) = b(X) +x" < ‘m(X) 

= X + X 2 + X 3 + x 6 

The codeword is therefore 0111001. The four rightmost bits, 1001, are the specified 
message bits. The three leftmost bits. Oil, are the parity-check bits. The codeword thus 
generated is exactly the same as the corresponding one shown in Table 10.1 for a (7,4) 
Hamming code. 

We may generalize this result by stating that: 


We next show that the generator polynomial g(X) and the parity-check polynomial h(X) 
uniquely specify the generator matrix G and the parity-check matrix H, respectively. 

To construct the 4-by-7 generator matrix G, we start with four vectors represented by 
g(X) and three cyclic-shifted versions of it, as shown by 
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g(X) = 1 + X + x 3 
XgX = X + x 2 + x 4 

2 2 3 5 

X g(X) = X + X + X 

X 3 g(X) = x 3 +x 4 + x 6 

The vectors g(X), Xg(X), X 2 g(X), and X 3 g(X) represent code polynomials in the (7,4) 
Hamming code. If the coefficients of these polynomials are used as the elements of the 
rows of a 4-by-7 matrix, we get the following generator matrix: 


G' 


1101000 

0110100 

0011010 

0001101 


Clearly, the generator matrix G' so constructed is not in systematic form. We can put it into 
a systematic form by adding the first row to the third row, and adding the sum of the first 
two rows to the fourth row. These manipulations result in the desired generator matrix: 


1101000 

0110100 

1110010 

1010001 


which is exactly the same as that in Example 1 . 

We next show how to construct the 3-by-7 parity-check matrix H from the parity-check 
polynomial h(X). To do this, we first take the reciprocal of h(X), namely X 4 h(X ] ). For the 
problem at hand, we form three vectors represented by X 4 h(X~ l ) and two shifted versions 
of it, as shown by 

4-1 234 

Xh(X ) = l+X +X +X 

X 5 h(X _1 ) = X + X 3 +X 4 + X 5 

X 6 h(i>r 1 ) = X 2 + X 4 + X 5 + X 6 

Using the coefficients of these three vectors as the elements of the rows of the 3-by-7 
parity-check matrix, we get 


H' = 


1011100 
0 10 1110 
0010111 


Here again we see that the matrix H' is not in systematic form. To put it into a systematic 
form, we add the third row to the first row to obtain 


H = 


1001011 
0 10 1110 
0010111 


which is exactly the same as that of Example 1 . 
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Encoder for the (7,4) cyclic code generated by g(X) = 1 + X + X 3 . 


Figure 10.10 shows the encoder for the (7,4) cyclic Flamming code generated by the 
polynomial g(X) = 1 + X + X . To illustrate the operation of this encoder, consider the 
message sequence (1001). The contents of the shift register are modified by the incoming 
message bits as in Table 10.3. After four shifts, the contents of the shift register, and 
therefore the parity-check bits, are (Oil). Accordingly, appending these parity-check bits 
to the message bits (1001 ), we get the codeword (01 1 1001); this result is exactly the same 
as that determined earlier in Example 1 . 


Contents of the shift register in the encoder 
of Figure 10.10 for message sequence (1 001 ) 





000 (initial state) 

1 

1 

110 

2 

0 

Oil 

3 

0 

111 

4 

1 

Oil 


Figure 10.11 shows the corresponding syndrome calculator for the (7,4) Flamming 
code. Let the transmitted codeword be (0111001) and the received word be (0110001); 
that is, the middle bit is in error. As the received bits are fed into the shift register, initially 
set to zero, its contents are modified as in Table 10.4. At the end of the seventh shift, the 
syndrome is identified from the contents of the shift register as 110. Since the syndrome is 
nonzero, the received word is in error. Moreover, from Table 10.2, we see that the error 
pattern corresponding to this syndrome is 0001000. This indicates that the error is in the 
middle bit of the received words, which is indeed the case. 


Syndrome 
calculator for the 
(7,4) cyclic code 
generated by the 
polynomial 
g(X) = 1 + X + X 3 . 


Received 

bits 
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Contents of the syndrome calculator 
in Figure 10.11 for the received word (0110001) 





000 (initial state) 

1 

1 

100 

2 

0 

010 

3 

0 

001 

4 

0 

110 

5 

1 

111 

6 

1 

001 

7 

0 

110 


Maximal-Length Codes 

For any positive integer m> 3 , there exists a maximal-length code with the following 
parameters: 


block length: n = 2 m - 1 

number of message bits: k = m 

minimum distance: d min = 2 m 1 

Maximal-length codes are generated by vectors of the form 


gpo = 


i + x n 
h(X) 


where h(X) is any primitive polynomial of degree m . Earlier we stated that any cyclic code 
generated by a primitive polynomial is a Hamming code of minimum distance 3 (see Exam- 
ple 2). It follows, therefore, that maximal-length codes are the dual of Hamming codes. 

The polynomial h(X) defines the feedback connections of the encoder. The generator pol- 
ynomial g(X) defines one period of the maximal-length code, assuming that the encoder is in 
the initial state 00 ... 01. To illustrate this, consider the example of a (7,3) maximal-length 
code, which is the dual of the (7,4) Hamming code described in Example 2. Thus, choosing 

h(X) = l+X + X 3 


we find that the generator polynomial of the (7,3) maximal-length code is 

g(X) = l+X + X 2 + x 4 


Figure 10.12 shows the encoder for the (7,3) maximal-length code. The period of the code 
is n = 7. Thus, assuming that the encoder is in the initial state 001, as indicated in Figure 
10.12, we find the output sequence is described by 


100 1110100 
initial state g(X) = 1 + X + X~ + X 4 
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Modulo-2 Flip-flop 

adder 


Encoder for the (7,3) maximal-length code; 
the initial state of the encoder is shown in the figure. 


This result is readily validated by cycling through the encoder of Figure 10.12. 

Note that if we were to choose the other primitive polynomial 

h(X) = 1 +X 2 +X 3 

for the (7,3) maximal-length code, we would simply get the “image” of the code described 
above, and the output sequence would be “reversed” in time. 


A study of cyclic codes for error control would be incomplete without a discussion of 
Reed-Solomon codes, albeit briefly. 

Unlike the cyclic codes considered in this section, Reed-Solomon codes are nonbinary 
codes. A cyclic code is said to be nonbinary in that given the code vector 


c = (c 0 , Cl , 

the coefficients {c ( }"_ ^ are not binary 0 or 1. Rather, the c ; - are themselves made up of 
sequences of Os and Is, with each sequence being of length k. A Reed-Solomon code is 
therefore said to be a (/-ary code, which means that the size of the alphabet used in 
construction of the code is q = 2 . To be specific, a Reed-Solomon ( n,k ) code is used to 

171 177 

encode »7-bit symbols into blocks consisting of n = 2 - 1 symbols; that is, m ( 2 - 1) 
bits, where m > 1. Thus, the encoding algorithm expands a block of k symbols to n 
symbols by adding n-k redundant symbols. When m is an integer power of 2, the m-bit 
symbols are called bytes. A popular value of m is 8; indeed, 8-bit Reed-Solomon codes are 
extremely powerful. 

A f-error-correcting Reed-Solomon code has the following parameters: 

171 

block length n = 2 - 1 symbols 

message size k symbols 


parity-check size n-k = 2 1 symbols 
minimum distance d • = 2t + 1 symbols 


The block length of the Reed-Solomon code is one less than the size of a code symbol, 
and the minimum distance is one greater than the number of parity-check symbols. Reed- 
Solomon codes make highly efficient use of redundancy; block lengths and symbol sizes 
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can be adjusted readily to accommodate a wide range of message sizes. Moreover, Reed- 
Solomon codes provide a wide range of code rates that can be chosen to optimize 
performance, and efficient techniques are available for their use in certain practical 
applications. In particular, a distinctive feature of Reed-Solomon codes is their ability to 
correct bursts of errors, hence their application in wireless communications to combat the 
fading phenomenon. 


Convolutional Codes 


In block coding, the encoder accepts a A'-bit message block and generates an n-bit 
codeword, which contains n - k parity-check bits. Thus, codewords are produced on a 
block-by-block basis. Clearly, provision must be made in the encoder to buffer an entire 
message block before generating the associated codeword. There are applications, 
however, where the message bits come in serially rather than in large blocks, in which 
case the use of a buffer may be undesirable. In such situations, the use of convolutional 
coding may be the preferred method. A convolutional coder generates redundant bits by 
using modulo-2 convolutions', hence the name convolutional codes . 

The encoder of a binary convolutional code with rate 1 In, measured in bits per symbol, 
may be viewed as a finite-state machine that consists of an M-stage shift register with 
prescribed connections to n modulo-2 adders and a multiplexer that serializes the outputs 
of the adders. A sequence of message bits produces a coded output sequence of length 
n(L + M) bits, where L is the length of the message sequence. The code rate is therefore 
given by 

L 

r n(L + M) 

= —r, — , bits/symbol 
«( 1 + M/L) 

Typically, we have L »M, in which case the code rate is approximately defined by 

r « - bits/symbol 
n 

An important characteristic of a convolutional code is its constraint length, which we 
define as follows: 


In an encoder with an M-stage shift register, the memory of the encoder equals M message 
bits. Correspondingly, the constraint length, denoted by v, equals M + 1 shifts that are 
required for a message bit to enter the shift register and finally come out. 

Figure 10.13 shows a convolutional encoder with the number of message bits n = 2 and 
constraint length v = 3. In this example, the code rate of the encoder is 1/2. The encoder 
operates on the incoming message sequence, one bit at a time, through a convolution 
process; it is therefore said to be a nonsystematic code. 
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Constraint length-3, rate -1/2 
convolutional encoder. 


Path 1 



Each path connecting the output to the input of a convolutional encoder may be 
characterized in terms of its impulse response , defined as follows: 


Equivalently, we may characterize each path in terms of a generator polynomial, defined 
as the unit-delay transform of the impulse response. To be specific, let the generator 
sequence (go\ g[‘\ g^\ denote the impulse response of the /th path, where the 

coefficients g jj\ g^‘\ g^\ g'/J equal symbol 0 or 1. Correspondingly, the generator 
polynomial of the ith path is defined by 


8 (i \D) = g ^ + g ?D + g^D 2 + . 


^ (0n M 

" + S m d 


where D denotes the unit-delay variable. The complete convolutional encoder is described 

(j) M 

by the set of generator polynomials { g (D ) }; = i . 


Convolutional Encoder 

Consider again the convolutional encoder of Figure 10.13, which has two paths numbered 
1 and 2 for convenience of reference. The impulse response of path 1 (i.e., upper path) is 
(1, 1, 1). Hence, the generator polynomial of this path is 

g (1) (D) = 1 +D + D 2 

The impulse response of path 2 (i.e., lower path) is (1, 0, 1). The generator polynomial of 
this second path is 

g {2 \D)= 1 +D 2 

For an incoming message sequence given by (10011), for example, we have the 
polynomial representation 

m(D) = 1 + D 3 + D 4 
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As with Fourier transformation, convolution in the time domain is transformed into 
multiplication in the D-domain. Flence, the output polynomial of path 1 is given by 

c W (D) = g (1 \D)m(D) 

= (1+D + D 2 )(1 +d~ +d a ) 

= 1 +D + D 2 + D 3 + D 6 

where it is noted that the sums D 4 + D 4 and D 5 + D 5 are both zero in accordance with the 
rules of binary arithmetic. We therefore immediately deduce that the output sequence of 
path 1 is (1 1 1 1001). Similarly, the output polynomial of path 2 is given by 

c ( 2) (D) = g {2) {D)m{D) 

= (1 + /T)(1 +D 3 + D 4 ) 

. „2 3 _4 „5 6 

= 1 + D + D + D + D +D 

The output sequence of path 2 is therefore (1011111). Finally, multiplexing the two output 
sequences of paths 1 and 2, we get the encoded sequence 

c = (11, 10, 11, 11,01,01, 11) 

Note that the message sequence of length L = 5 bits produces an encoded sequence of 
length n(L + v - 1) = 14 bits. Note also that for the shift register to be restored to its initial 
all-zero state, a terminating sequence of v - 1 = 2 zeros is appended to the last input bit of 
the message sequence. The terminating sequence of v - 1 zeros is called the tail of the 
message. 


Traditionally, the structural properties of a convolutional encoder are portrayed in 
graphical form by using any one of three equivalent graphs: code tree, trellis graph, and 
state graph. 

Although, indeed, these three graphical representations of a convolutional encoder look 
different, their compositions follow the same underlying rule: 


Hereafter, we refer to this convention as the graphical rule of a convolutional encoder. 

We will use the convolutional encoder of Figure 10.13 as a running example to 
illustrate the insights that each one of these three diagrams provides. 


We begin the graphical representation of a convolutional encoder with the code tree of 
Figure 10.14. Each branch of the tree represents an input bit, with the corresponding pair 
of output bits indicated on the branch. The convention used to distinguish the input bits 0 
and 1 follows the graphical rule described above. Thus, a specific path in the tree is traced 
from left to right in accordance with the message sequence. The corresponding coded bits 
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on the branches of that path constitute the message sequence (10011) applied to the input 
of the encoder of Figure 10.13. Following the procedure just described, we find that the 
corresponding encoded sequence is (11, 10, 11, 11, 01), which agrees with the first five 
pairs of bits in the encoded sequence {c, } that was derived in Example 4. 
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Code tree for the convolutional encoder of Figure 10.13. 
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From Figure 10.14, we observe that the tree becomes repetitive after the first three 
branches. Indeed, beyond the third branch, the two nodes labeled a are identical and so are 
all the other node pairs that are identically labeled. We may establish this repetitive 
property of the tree by examining the associated encoder of Figure 10.13. The encoder has 
memory M = v - 1 = 2 message bits. We therefore find that, when the third message bit 
enters the encoder, the first message bit is shifted out of the register. Consequently, after the 
third branch, the message sequences (100 myn ^. . .) and (000 myn 4...) generate the same 
code symbols, and the pair of nodes labeled a may be joined together. The same reasoning 
applies to the other nodes in the code tree. Accordingly, we may collapse the code tree of 
Figure 10.14 into the new form shown in Figure 10.15, which is called a trellis. It is so 
called since a trellis is a treelike structure with re-emerging branches. The convention used 
in Figure 10.15 to distinguish between input symbols 0 and 1 is as follows: 


As before, each message sequence corresponds to a specific path through the trellis. For 
example, we readily see from Figure 10.15 that the message sequence (10011) produces 
the encoded output sequence (11, 10, 11, 11,01), which agrees with our previous result. 


In conceptual terms, a trellis is more instructive than a tree. We say so because it brings out 
explicitly the fact that the associated convolutional encoder is in actual fact a finite-state 
machine. Basically, such a machine consists of a tapped shift register and, therefore, has a 
finite state; hence the name of the machine. Thus, we may conveniently say the following: 


For example, the convolutional encoder of Figure 10. 13 has a shift register made up of two 
memory cells. With the message bit stored in each memory cell being 0 or 1, it follows 
that this encoder can assume any one of 2“ = 4 possible states, as described in Table 10.5. 


00 00 00 00 00 00 00 00 



Trellis for the convolutional encoder of Figure 10.13. 
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State table for the 

convolutional encoder of Figure 10.13 


a 00 

b 10 

c 01 

d 11 

In describing a convolutional encoder, the notion of state is important in the following 
sense: 


To illustrate this statement, consider the general case of a rate 1 In convolutional encoder of 
constraint length v. Let the state of the encoder at time-unit j be denoted by 

S = (m j _ l ,m j _ 2 , ... ,mj _ v+1 ) 

The jth codeword Cj is completely determined by the state S together with the current 
message bit nij. 

Now that we understand the notion of state, the trellis graph of the simple convolutional 
encoder of Figure 10.13 for v = 3 is presented in Figure 10.15. From this latter figure, we 
now clearly see a unique characteristic of the trellis diagram: 


To be more specific, the first v - 1 = 2 time-steps correspond to the encoder’s departure 
from the initial zero state and the last v - 1 = 2 time-steps correspond to the encoder’s 
return to the initial zero state. Naturally, not all the states of the encoder can be reached in 
these two particular portions of the trellis. However, in the central portion of the trellis, for 
which time-unit j lies in the range v - 1 <j < L , where L is the length of the incoming 
message sequence, we do see that all the four possible states of the encoder are reachable. 
Note also that the central portion of the trellis exhibits a fixed periodic structure, as 
illustrated in Figure 10.16a. 


The periodic structure characterizing the trellis leads us next to the state diagram of a 
convolutional encoder. To be specific, consider a central portion of the trellis 
corresponding to times j and j + 1 . We assume that for j > 2 in the example of Figure 
10.13, it is possible for the current state of the encoder to be a, b, c, or d. For convenience 
of presentation, we have reproduced this portion of the trellis in Figure 10.16a. The left 
nodes represent the four possible current states of the encoder, whereas the right nodes 
represent the next states. Clearly, we may coalesce the left and right nodes. By so doing, 
we obtain the state graph of the encoder, shown in Figure 10.16b. The nodes of the figure 
represent the four possible states of the encoder a, b, c, and d, with each node having two 
incoming branches and two outgoing branches, following the graphical rule described 
previously. 
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(a) A portion of the central part of the trellis for the encoder of 
Figure 10.13. (b) State graph of the convolutional encoder of Figure 10.13. 


The binary label on each branch represents the encoder’s output as it moves from one 
state to another. Suppose, for example, the current state of the encoder is (01), which is 
represented by node c. The application of input symbol 1 to the encoder of Figure 10.13 
results in the state (10) and the encoded output (00). Accordingly, with the help of this 
state diagram, we may readily determine the output of the encoder of Figure 10.13 for any 
incoming message sequence. We simply start at state a, the initial all-zero state, and walk 
through the state graph in accordance with the message sequence. We follow a solid 
branch if the input is bit 0 and a dashed branch if it is bit 1 . As each branch is traversed, we 
output the corresponding binary label on the branch. Consider, for example, the message 
sequence (10011). For this input, we follow the path abcabd, and therefore output the 
sequence (11, 10, 11, 11, 01), which agrees exactly with our previous result. Thus, the 
input-output relation of a convolutional encoder is also completely described by its state 
graph. 


The convolutional codes described thus far in this section have been feedforward 
structures of the nonsystematic variety. There is another type of linear convolutional codes 
that are the exact opposite, being recursive as well as systematic; they are called recursive 
systematic convolutional (RSC) codes. 
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Input message vector 










p ^ 




Output code vector 


Example of a recursive systematic convolutional (RSC) encoder. 


Figure 10.17 illustrates a simple example of an RSC code, two distinguishing features 
of which stand out in the figure: 

The code is systematic , in that the incoming message vector m ; at time-unit j defines 
the systematic part of the code vector c y at the output of the encoder. 

The code is recursive by virtue of the fact that the other constituent of the code 
vector, namely the parity-check vector by, is related to the message vector m ; by the 
modulo-2 recursive equation 


m. + b._i 


= b 7 


where by_j is the past value of b, stored in the memory of the encoder. 

From an analytic point of view, in studying RSC codes, it is more convenient to work in 
the transform //-domain than the time domain. By definition, we have 


J 


l = D[ b.] 


and therefore rewrite (10.50) in the equivalent form: 

b , = rb [n V 

where the transfer function 1/(1 + D) operates on m ; to produce by. With the code vector Cy 
consisting of the message vector m ; followed by the parity-check vector by, we may 
express the code vector Cy produced in response to the message vector niy as follows: 


C j 


<"V V 




, m 

DJ J 


It follows, therefore, that the code generator for the RSC code of Figure 10.17 is given by 
the matrix 


G (D) = 1, 


1 

1 +D 


Generalizing, we may now make the statement: 
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The same statement applies equally well to the parity-check generator H(Z3) compared 
with its time-domain counterpart H. 

The rationale behind making convolutional codes recursive is to feed one or more of 
the tap-outputs in the shift register back to the encoder input, which, in turn, makes the 
internal state of the shift register depend on past outputs. This modification, compared 
with a feedforward convolutional code, affects the behavior of error patterns in a profound 
way, which is emphasized in the following statement: 


This property of recursive convolutional codes turns out to be one of the key factors 
behind the outstanding performance achieved by the class of turbo codes, to be discussed 
in Section 10.12. Therein, we shall see that feedback plays a key role not only in the 
encoder of turbo codes but also the decoder. For reasons that will become apparent later, 
further work on turbo codes will be deferred to Section 10.12. 

Optimum Decoding of Convolutional Codes 


In the meantime, we resume the discussion on convolutional codes whose encoders are of 
the feedforward variety, aimed at the development of two different decoding algorithms, 
each of which is optimum according to a criterion of its own. 

The first algorithm is the maximum likelihood (ML) decoding algorithm ; the decoder is 
itself referred to as the maximum likelihood decoder (maximum likelihood estimation was 
discussed in Chapter 3). A distinctive feature of this decoder is that it produces a codeword 
as output, the conditional probability of which is always maximized on the assumption 
that each codeword in the code is equiprobable. From Chapter 3 on probability theory, we 
recall that the conditional probability density function of a random variable X given a 
quantity d can be rethought as the likelihood function of d with that function being 
dependent on X , given a parameter 6. We may therefore make the statement: 


The second algorithm is the maximum a posteriori (MAP) probability decoding algorithm ; 
the decoder is correspondingly referred to as a MAP decoder. In light of this second 
algorithm’s name, we may make the statement: 


These two decoding algorithms, optimal in accordance with their own respective criteria, 
are distinguished from each other as follows: 
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Stated in another way, we may say: 


Typically, the ML decoder is simpler to implement; hence its popular use in practice. 
However, the MAP decoding algorithm is preferred over the ML decoding algorithm in 
the following two situations: 

The information bits are not equally likely. 

Iterative decoding is used in the receiver, in which case the a priori probabilities of 
the message bits change from one iteration to the next; such a situation arises in 
turbo decoding, which is discussed in Section 10.12. 


The ML decoding algorithm is applied to convolutional codes in Section 10.8; in so doing, 
we are, in effect, opting for a simple approach to decode convolutional codes. This simple 
approach is also applicable to another class of codes, called trellis-coded modulation, 
which is discussed in Section 10.15. 

Then, in Section 10.9 we move on to study the MAP decoding algorithm; the length of 
that section and the illustrative example in Section 10.10 are testimony to the complexity 
of this second approach to decoding convolutional codes. Equipped with the MAP 
algorithm and its modified forms. Section 10.12 and 10.13 discuss their application to 
turbo codes. It is in the material covered in those two sections that we find the practical 
benefits of feedback in decoding turbo codes. 


Maximum Likelihood Decoding of Convolutional Codes 


We begin the discussion of decoding convolutional codes by first describing the 
underlying theory of maximum likelihood decoding. The description is best understood by 
focusing on a trellis that represents each time step in the decoding process with a separate 
state graph. 

Let m denote a message vector and c denote the corresponding code vector applied by 
the encoder to the input of a discrete memoryless channel. Let r denote the received 
vector, which, in practice, will invariably differ from the transmitted code vector c due to 
additive channel noise. Given the received vector r, the decoder is required to make an 
estimate m of the message vector m. Since there is a one-to-one correspondence between 
the message vector m and the code vector c, the decoder may equivalently produce an 
estimate c of the code vector. We may then put 

m = m if and only if c = c 

Otherwise, a decoding error is committed in the receiver. The decoding rule for choosing 
the estimate c , given the received vector r, is said to be optimum when the probability of 
decoding error is minimized. In light of the material presented on signaling over AWGN 
channel in Chapter 7, we may state: 
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Let P(r|c) denote the conditional probability of receiving r, given that c was sent. The 
log-likelihood function equals lnP(r|c), where In denotes the natural logarithm. The 
maximum likelihood decoder for decision making is described as follows: 


Consider next the special case of a binary symmetric channel. In this case, both the 
transmitted code vector c and the received vector r represent binary sequences of some 
length N. Naturally, these two sequences may differ from each other in some locations 
because of errors due to channel noise. Let c ; - and r ( denote the ith elements of c and r, 
respectively. We then have 

N 

P(r|c) = Y\P( r i\ c i) 

i = 1 

Correspondingly, the log-likelihood function is 

N 

lnP(r|c) = ^ ln/?(r ( . |c ( ) 
i = 1 

The term plr^Cj) in (10.55) denotes a transition probability, which is defined by 


P(rACi) 


p, if r- c - 
1 - p, if = c i 


Suppose also that the received vector r differs from the transmitted code vector c in 
exactly cl places in the codeword, By definition, the number d is the Hamming distance 
between the vectors r and c. Hence, we may rewrite the log-likelihood function in (10.55) 
as follows: 

I n /j ( r | c ) = dlnp + (N - c/)ln(l -p) 

= <nnf-£— 1 +Nln(\ -p) 
vl -pJ 

In general, the probability of an error occurring is low enough for us to assume p < 1/2. We 
also recognize that Mn( 1 - p) is a constant for all c. Accordingly, we may restate the 
maximum-likelihood decoding rule for the binary symmetric channel as follows: 


That is, for the binary symmetric channel, the maximum-likelihood decoder for a 
convolutional code reduces to a minimum distance decoder. In such a decoder, the 
received vector r is compared with each possible transmitted code vector c, and the 
particular one closest to r is chosen as the correct transmitted code vector. The term 
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“closest” is used in the sense of minimum number of differing binary symbols (i.e., 
Hamming distance) between the code and received vectors under investigation. 


The equivalence between maximum likelihood decoding and minimum distance decoding 
for the binary symmetric channel implies that we may decode a convolutional code by 
choosing a path in the code tree whose coded sequence differs from the received sequence 
in the fewest number of places. Since a code tree is equivalent to a trellis, we may equally 
limit our choice to the possible paths in the trellis representation of the code. The reason 
for preferring the trellis over the tree is that the number of nodes at each time instant does 
not continue to grow as the number of incoming message bits increases; rather, it remains 
constant at 2 v ', where v is the constraint length of the code. 

Consider, for example, the trellis diagram of Figure 10.15 for a convolutional code with 
rate r = 1/2 and constraint length v = 3. We observe that, at time-unit j = 3, there are two 
paths entering any of the four nodes in the trellis. Moreover, these two paths will be 
identical onward from that point. Clearly, a minimum distance decoder may make a 
decision at that point as to which of those two paths to retain, without any loss of 
performance. A similar decision may be made at time-unit j = 4, and so on. This sequence 
of decisions is exactly what the Viterbi algorithm does as it walks through the trellis. The 
algorithm operates by computing a metric (i.e., discrepancy) for every possible path in the 
trellis; hence the following statement: 


Thus, for each node (state) in the trellis of Figure 10.15 the algorithm compares the two 
paths entering the node. The path with the lower metric is retained and the other path is 
discarded. This computation is repeated for every time-unit j of the trellis in the range 
M < j < L, where M = v - 1 is the encoder’s memory and L is the length of the 
incoming message sequence. The paths that are retained by the algorithm are called 
survivor or active paths. For a convolutional code of constraint length v = 3, for 
example, no more than 2 v ~ 1 = 4 survivors and their metrics will ever be stored. The 
list of 2 V ~ 1 paths computed in the manner just described is always guaranteed to 
contain the maximum-likelihood choice. 

A difficulty that may arise in the application of the Viterbi algorithm is the possibility 
that when the paths entering a state are compared, their metrics are found to be identical. 
In such a situation, we simply make the choice by flipping a fair coin (i.e., simply make a 
random guess). 

To sum up: 


The algorithm proceeds in a step-by-step fashion, as summarized in Table 10.6. 
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Summary of the Viterbi algorithm 

The Viterbi algorithm is a maximum likelihood decoder, which is optimal for any 
discrete memoryless channel. It proceeds in three basic steps. In computational terms, 
the so-called add-compare-select (ACS) operation in Step 2 is at the heart of the 
Viterbi algorithm. 


Set the all-zero state of the trellis to zero, 
time-unit j 

Start the computation at some time-unit j and determine the metric for the path that 
enters each state of the trellis. Hence, identify the survivor and store the metric for each 
one of the states. 

time-unit /+ 1 

For the next time-unit j + 1, determine the metrics for all 2 v 1 paths that enter a state 
where v is the constraint length of the convolutional encoder; hence do the following: 

Add the metrics entering the state to the metric of the survivor at the preceding 
time-unit j\ 

Compare the metrics of all 2 1 paths entering the state; 

Select the survivor with the largest metric, store it along with its metric, and 
discard all other paths in the trellis. 

continuation of the search to convergence 

Repeat Step 2 for time-unit j < L + L' , where L is the length of the message sequence 
and L' is the length of the termination sequence. 

Stop the computation once the time-unit j = L+ L' is reached. 


Correct Decoding of Received All-Zero Sequence 

Suppose that the encoder of Figure 10.13 generates an all-zero sequence that is sent over a 
binary symmetric channel and that the received sequence is (0100010000 ...). There are 
two errors in the received sequence due to noise in the channel: one in the second bit and 
the other in the sixth bit. We wish to show that this double-error pattern is correctable 
through the application of the Viterbi decoding algorithm. 

In Figure 10.18 we show the results of applying the algorithm for time-unit j = 1, 2, 3, 
4, 5. We see that for j = 2 there are (for the first time) four paths, one for each of the four 
states of the encoder. The figure also includes the metric of each path for each level in the 
computation. 

In the left side of Figure 10.18, for time-unit j = 3 we show the paths entering each of 
the states, together with their individual metrics. In the right side of the figure we show the 
four survivors that result from application of the algorithm for time-unit j = 3, 4, 5. 
Examining the four survivors in the figure for j = 5, we see that the all-zero path has the 
smallest metric and will remain the path of smallest metric from this point forward. This 
clearly shows that the all-zero sequence is indeed the maximum likelihood choice of the 
Viterbi decoding algorithm, which agrees exactly with the transmitted sequence. 
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Illustrating steps in the Viterbi 
algorithm for Example 5. 
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Incorrect Decoding of Received All-Zero Sequence 

Suppose next that the received sequence is (1100010000 . . .), which contains three errors 
compared with the transmitted all-zero sequence; two of the errors are adjacent to each 
other and the third is some distance away. 

In Figure 10.19, we show the results of applying the Viterbi decoding algorithm for 
levels j = 1, 2, 3, 4. We see that in this second example on Viterbi decoding the correct 
path has been eliminated by time-unit j = 3. Clearly, a triple-error pattern is uncorrectable 
by the Viterbi algorithm when applied to a convolutional code of rate 1/2 and constraint 
length v = 3. The exception to this algorithm is a triple-error pattern spread over a time 
span longer than one constraint length, in which case it is likely to be correctable. 
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In Example 5 there were two errors in the received sequence, whereas in Example 6 there 
were three errors, two of which were in adjacent symbols and the third one was some 
distance away. In both examples the encoder used to generate the transmitted sequence 
was the same. The difference between the two examples was attributed to the fact that the 
number of errors in Example 6 was beyond the error-correcting capability of the 
maximum likelihood decoding algorithm, which is the next topic for discussion. 


The performance of a convolutional code depends not only on the decoding algorithm 
used but also on the distance properties of the code. In this context, the most important 
single measure of a convolutional code’s ability to combat errors due to channel noise is 
(he free distance of the code, denoted by c/f ree ; it is defined as follows: 


A convolutional code with free distance c/ free can, therefore, correct t errors if, and only if, 
df Tee is greater than 2 1. 

The free distance can be obtained quite simply from the state graph of the convolutional 
encoder. Consider, for example. Figure 10.16b, which shows the state graph of the encoder 
of Figure 10.13. Any nonzero code sequence corresponds to a complete path beginning 
and ending at the 00 state (i.e., node a). We thus find it useful to split this node in the 
manner shown in the modified state graph of Figure 10.20, which may be viewed as a 
signal-flow graph with a single input and single output. 

A signal-flow graph consists of nodes and directed branches', it operates by the 
following set of rules: 

A branch multiplies the signal at its input node by the transmittance characterizing 
that branch. 

A node with incoming branches sums the signals produced by all of those branches. 



Modified state graph of convolutional encoder. 
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The signal at a node is applied equally to all the branches outgoing from that node. 
The transfer function of the graph is the ratio of the output signal to the input signal. 

Returning to the signal-flow graph of Figure 10.20, the exponent of D on a branch in this 
graph describes the Hamming weight of the encoder output corresponding to that branch; 
the symbol D used here should not be confused with the unit-delay variable in Section 
10.6 and the symbol L used herein should not be confused with the length of the message 
sequence. The exponent of L is always equal to one, since the length of each branch is one. 
Let T(D,L ) denote the transfer function of the signal-flow graph, with I) and L playing the 
role of dummy variables. For the example of Figure 10.20, we may readily use rules 1, 2, 
and 3 to obtain the following input-output relations: 


2 

b = D~La 0 + Lc 
c = DLb + DLd 

> 

cl = DLb + DLd 
fl| = D~Lc 


where a q, b, c, d, and cq denote the node signals of the graph. Solving the system of four 
equations in (10.58) for the ratio afa q, we obtain the transfer function 


T(D, L) 


5 3 

D L 

1 - DL( 1 +L) 


Using the binomial expansion, we may equivalently express T(D,L ) as follows: 
T(D,L) = D 5 L 3 (l-DL(l+L)y 1 


= D 5 L 3 £ (DL( 1 +L ))' 

i = 0 

Setting L = 1 in this formula, we thus get the distance transfer function expressed in the 
form of a power series as follows: 

T(D, 1) = D 5 + 2D 6 + 4D 7 + ■■■ 

Since the free distance is the minimum Hamming distance between any two codewords in 
the code and the distance transfer function T(D, 1) enumerates the number of codewords 
that are a given distance apart, it follows that the exponent of the first term in the 
expansion of T(D, 1) in (10.60) defines the free distance. Thus, on the basis of this 
equation, the convolutional code of Figure 10.13 has the free distance c/ free = 5. 

This result indicates that up to two errors in the received sequence are correctable, as 
two or fewer transmission errors will cause the received sequence to be at most at a 
Hamming distance of 2 from the transmitted sequence but at least at a Hamming distance 
of 3 from any other code sequence in the code. In other words, in spite of the presence of 
any pair of transmission errors, the received sequence remains closer to the transmitted 
sequence than any other possible code sequence. However, this statement is no longer true 
if there are three or more closely spaced transmission errors in the received sequence. The 
observations made here reconfirm the results reported earlier in Examples 5 and 6. 
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The transfer function of the encoder’s state graph, modified in a manner similar to that 
illustrated in Figure 10.20, may be used to evaluate a bound on the BER for a given 
decoding scheme; details of this evaluation are, however, beyond the scope of our present 
discussion. Here, we simply summarize the results for two special channels, namely the 
binary symmetric channel and the binary-input AWGN channel, assuming the use of 
binary PSK with coherent detection. 

Binary symmetric channel. 

The binary symmetric channel may be modeled as an AWGN channel with binary 
PSK as the modulation in the transmitter followed by hard-decision demodulation in 
the receiver. The transition probability p of the binary symmetric channel is then 
equal to the BER for the uncoded binary PSK system. From Chapter 7 we recall that 
for large values of E^/Nq, denoting the ratio of signal energy per bit-to-noise power 
spectral density, the BER for binary PSK without coding is dominated by the 
exponential factor exp(-E b / N q). On the other hand, the BER for the same 
modulation scheme with convolutional coding is dominated by the exponential 
factor exp(-dfr ee rE b /2N 0 ) , where r is the code rate and z/f ree is the free distance of 
the convolutional code. Therefore, as a figure of merit for measuring the 
improvement in error performance made by the use of coding with hard-decision 
decoding, we may set aside the E b /N 0 to use the remaining exponent to define the 
asymptotic coding gain (in decibels) as follows: 



Binary-input AWGN channel. 

Consider next the case of a memoryless binary-input AWGN channel with no output 
quantization (i.e., the output amplitude lies in the interval (-oo, oo) ). For this 
channel, theory shows that for large values of E b /N 0 the BER for binary PSK with 
convolutional coding is dominated by the exponential factor exp (-df lee rE b /N 0 ), 
where the parameters are as previously defined. Accordingly, in this second case, we 
find that the asymptotic coding gain is defined by 

G a = 101og 10 (rf free r) dB 

Comparing (10.61) and (10.62) for cases 1 and 2, respectively, we see that the asymptotic 
coding gain for the binary-input AWGN channel is greater than that for the binary 
symmetric channel by 3 dB. In other words, for large E b / N q, the transmitter for a binary 
symmetric channel must generate an additional 3 dB of signal energy (or power) over that 
for a binary-input AWGN channel if we are to achieve the same error performance. 
Clearly, there is an advantage to be gained by using an unquantized demodulator output in 
place of making hard decisions. This improvement in performance, however, is attained at 
the cost of increased decoder complexity due to the requirement for accepting analog 
inputs. 

It turns out that the asymptotic coding gain for a binary-input AWGN channel is 
approximated to within about 0.25 dB by a binary input Q - ary output discrete memoryless 
channel with the number of representation levels <2 = 8. This means that, for practical 
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purposes, we may avoid the need for an analog decoder by using a soft-decision decoder 
that performs finite output quantization (typically, Q = 8), and yet realize a performance 
close to the optimum. 


When the received sequence is very long, the storage requirement of the Viterbi algorithm 
becomes too high, in which case some compromises must be made. The approach usually 
taken in practice is to “truncate” the path memory of the decoder as follows: 


Naturally, decoding decisions made in the way just described are no longer truly 
maximum likelihood, but they can be made almost as good provided that the decoding 
window is chosen long enough. Experience and analysis have shown that satisfactory 
results are obtained if the decoding window length l is on the order of five times the 
constraint length v of the convolutional code or more. 


Maximum a Posteriori Probability Decoding of 
Convolutional Codes 


Summarizing the discussion on convolutional decoding presented in Section 10.8, we may 
say that, given a received vector r that is the noisy version of a convolutionally encoded 
vector c, the Viterbi algorithm computes the code vector c for which the log-likelihood 
function is maximum; for a binary symmetric channel, the code vector c minimizes the 
Hamming distance between the received vector r and the transmitted vector c. For the 
more general case of an AWGN channel, this result is equivalent to finding the vector c 
that is the closest to the received vector r in Euclidean distance. Simply put then: given the 
vector r, the Viterbi algorithm finds the most likely vector c that minimizes the 
conditional probability P(c ^ c|r) , which is the sequence error or the word error rate. 

In practice, however, we are often interested in the BER, defined as the conditional 
probability P (m i ^ m i |r) , where m t is an estimate of the ith bit of message vector m . 
Recognizing the fact that the BER can indeed assume a value different from the sequence 
error, we need a probabilistic decoding algorithm that minimizes the BER. 

Bahl, Cocke, Jelinek, and Raviv (1974) are credited for deriving an algorithm that 
maximizes the a posteriori probabilities of the states in the decoding model as well as the 
transition probability from one state to another. In the course of time, this decoding 
algorithm has become known as the BCJR algorithm in honor of its four inventors. The 
BCIR algorithm is applicable to any linear code, be it of a block or convolutional kind. 
However, as we may well expect, computational complexity of the BCJR algorithm is 
greater than that of the Viterbi algorithm. But, when the message bits in the received 
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vector r are equally likely, the Viterbi algorithm is preferred over the BCJR algorithm. 
When, however, the message bits are not equally likely, then the BCJR algorithm provides 
a better decoding performance than the Viterbi algorithm. Moreover, in iterative decoding 
exemplified by turbo decoding (to be discussed in Section 10.12), the a priori probabilities 
of the message bits may change from one iteration to the next; in such a scenario, the 
BCJR algorithm provides the best performance. 

Henceforth, the two terminologies, BCJR algorithm and maximum a posteriori 
probability (MAP) decoding algorithm, are used interchangeably. 

The function of the MAP decoder is to compute the values of log-a-posteriori ratios, on 
the basis of which estimates of the original message bits are computed in the receiver. In 
what follows, we derive the MAP decoding algorithm for the case of rate = I in 
convolutional codes applied to a binary input-continuous output AWGN channel. 
Henceforth, in this section, we use the mapping of bits 0 and 1 as follows: 

bit 0 — > level - 1 
bit 1 — » level + 1 

Thus, given a message sequence of block length L, we express the message vector m as 
follows: 


The individual elements in the message vector m are referred to as message bits. In any 
event, the vector m is encoded into the codeword c, which, in turn, produces the noisy 
received signal vector r at the channel output. Note, however, the elements of the vector r 
can assume positive as well as negative values, which, in theory, can be infinitely large due 
to the analog nature of the additive channel noise. 

Before proceeding further, there are two natural logarithmic concepts, namely log- 
likelihood ratios, that will occupy our attention in deriving the MAP decoding algorithm: 

A priori L-values, denoted by L a (fnj), which define the natural logarithmic ratio of a 
priori probabilities of message bits, nij = -1 and /r: ; = +1, generated by a source at the 
encoder input in the transmitter. 

A posteriori L-values, denoted by L p (nij), which define the log-likelihood ratio of 
the conditional a posteriori probabilities of the message bits nij = -1 and m.j = +1, 
given the channel output at the decoder input in the receiver. 

In what follows, we will focus on L p (nij) first, deferring the discussion of L. d (nij) until later 
in this section. 

With the message nij = ±1, there are two conditional probabilities to be considered: 
P(mj = +l|r) and P (ny = -l|r). These two probabilities are called the a posteriori 
probabilities (APPs). In terms of these two APPs, the log-a-posteriori L-value is defined 
by 


in = (m 0 , m j, ..., m L _ l ) 


where 


nij = ±1 for j = 0, 1, 
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Hereafter, for the sake of brevity, we refer to the LJrrij) simply as the a posteriori L-value 
of message bit ntj at time-unit j. Having computed a set of L p -values, the decoder makes a 
hard decision by applying the two-part formula: 


irij 


+1 if L p (ntj) > 0, 
-1 if L p (mj)< 0, 


7 = 0, 1, 1 


where L is the length of the message sequence; L must not be confused with the two 
L-values, L a (nij) and LJmj). 

Given the received vector r, the conditional probability P (my = +l|r) is expressed in 
terms of the joint probability density function /(my = +1, r) as follows: 


P (nij = +1 1 r ) 


/(m. = +l|r) 

/( r) 


where /(r) is the probability density function of the received vector r; this formula follows 
from the definition of joint probability. 

Similarly, we may express the second conditional probability P ( //r = —1 1 r) as follows: 


P(m^ = -1 1 r) 


f(m- = —1 ] r) 

/( r) 


Accordingly, using these two conditional properties and canceling the common term /( r), 
we may reformulate the a posteriori L-values of (10.63) in the equivalent form 


L p( m j) = ln 


f f(m j =+l|r) x 
yfinij = - 1 |r )J 


which sets the stage for deriving the MAP decoding algorithm. 


With computational complexity being at a premium, we propose to exploit the lattice 
structure of the convolutional code as the basis for deriving the MAP decoding algorithm. 
To this end, let Z ; denote the set of all state-pairs for which the states j- = .v' and 
Sj + ] = s correspond to message bit m- - + 1 . We may then express the conditional 
probability density function f(m- = +1 |r) in the expanded form: 

f( m j = +1 |r) °c ^ f(sj = s',s j+1 = s, r) 

(s', s) e Xj 

where the symbol °c stands for proportionality. In a similar way, we may reformulate the 
other conditional probability density function as follows: 

f( m j = - 1 |r) oc f( Sj = +1, sj = -1, r) 

(s', s) G £j 

where I.- is the set of all state-pairs for which the state-pair s- = s' and Sj+ I = v 
corresponds to the message bit nij = -1. Hence, substituting (10.66) and (10.67) into 
(10.65) and recognizing that the proportionality factor is common to both (10.66) and 
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(10.67), thereby canceling out, the a posteriori L p - value of message bit nij at time-unit j 
takes the following equivalent form: 

f ^ 


L p( m j) = ln 


X f ( s j = s '’ s j+ 1 

(s', S) e sj 
(s', s) e I . 


= s, r) 


= s, r ) 


Equation (10.68) provides the mathematical basis for forward-backward computation of 
the MAP decoding algorithm. In this context, it is important to note the following point in 
(10.68): 


Our next task is to show how the pair of joint probability density functions in (10.68) can 
be computed recursively , using forward and backward recursions. 

With this important point in mind, we introduce some new and relevant terminology. 
First, we express the received vector r as the triplet 

r = (r t>j ,r j ,r,< j ) 

where the two new terms r t< j and r t> j denote those portions of the received vector r that 
appear before and after time-unit j, respectively. Moreover, we simplify the notation by 
using s' and .v in place of s • = s' and Sj + \ = s, respectively, recognizing that the time- 
unit j is implicitly contained in the L finf). 

In particular, the joint probability density function common to the numerator and 
denominator in (10.68) is now rewritten as 

/(/ = s', s j+ j = j, r) = /(.s', s, T t>j ,Tj, r t<j ) 

Moreover, before proceeding further, we find it instructive to introduce two assumptions 
that are basic to derivation of the MAP decoding algorithm: 

Markovian Assumption 


Under this assumption, convolutional encoding of the message vector performed in 
the transmitter is said to be a Markov chain. 

Memoryless Assumption 


In other words, the channel has no knowledge of the past. 
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Resuming the discussion on the log a posteriori L- value, LJmj) in (10.68), we use the 
definition of joint probability density function to express the right-hand side of (10.69) as 
follows: 

/(s', s, r t>j , Tj, r t<j ) = f(r t>j \s', s, r ; , r t<j )f(s\ s, r p r t<j ) 

Focusing on the conditional probability density function on the right-hand side of this 
equality, we invoke the Markovian assumption to recognize that the vector r t> j 
representing the received vector r after time-unit j subsumes knowledge of the following 
three entities: 

• the state s' = Sj , 

• the vector r ; at time-unit j, and 

• the vector i j <t received before time-unit j. 

Accordingly, we may simplify matters by writing 

f(r t>j \s',s,Tj,r t<j ) = f(r t>j \s) 
where s denotes the state s • + j. 

Next, we again use the definition of joint probability density function to write 
/(s', s, Yj, r t<j ) = /(s, r .|s', r ( <; .) /(s', r (< / ) 

Focusing on the second conditional probability density function /(s, ry | s', r t< j) and 
invoking the Markovian assumption one more time, we recognize that the received vector Yj 
at time-unit j subsumes knowledge of the past vector Y t< j. Flence, we may further simplify 
matters by writing 

f(s,rj\s',r t< j) = /(s, ry | s ' ) 
where the states s = sj + i and s' = Sj. 

Collecting the results obtained in (10.70) and (10.71), we are finally ready to express 
the probability density function common to the numerator and denominator of (10.68) as 
follows: 

/(s', s, r) = /(r ;>J .|s)/(s, r|s')/(s’, r t<j ) 

which provides the mathematical basis for recursive implementation of the MAP decoding 
algorithm. 


To simplify the computational steps involved in deriving the algorithm, we now introduce 
the following three algorithmic metrics: 

«,<>') =f(s',r t<j ) 

Yj(s', s) = f(s, r ; |s') 
tO) =f( r t>j\ s ) 

Using these three metrics, we may finally express the probability density function 
common to the numerator and denominator of (10.68) in the simplified form: 

/(s', s, r) = /3 j+ l (s)y j (s', s)aj(s') 
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in light of which, hereafter, the three metrics are referred to as follows: 
forward metric aylY) 

branch metric y .(s', 5 ) 

backward metric fij + ] (■?) 

As the names would imply, the forward and backward metrics play key roles in the forward 
and backward recursions of the MAP decoding algorithm, respectively. As for the branch 
metric, its role is to couple these two recursions to work together in a harmonious manner. 


Updating the forward metric has the effect of moving from state s' at time-unit j to state s 
at time-unit j + 1 ; hence, we write 

a j + l( s ) =f(s,r t<j+l ) 

= £ f(s',s,r t<j+l ) 

S' E (Tj 

where cr is the set of ah the states at time-unit j. Using the definition of a joint probability 
density function, we write 

a j+ l( J ) = X /(i ’ r j\ S ’’ r t<J )f(s '' r ><? 
s ' e <jj 

= ^ f(s,Tj\s')f{s',r t< j) 

s' E cr 

where, in the second line, we used the Markovian assumption for r y subsuming r t <j. 
Hence, using the defining equations for the branch and forward metrics in (10.74) and 
(10.73), respectively, we simplify matters by writing 

a j+ iO) = X y/ 5 '> 

P E Oj 

For obvious reasons, (10.77) is called the forward recursion ; this recursion is illustrated 
graphically in Figure 10.21a. 


To formulate the recursion for the backward metric, we move from state s at time-unit j + 
1 back to state s' at time-unit j. Adapting the use of (10.75) to the scenario just described, 
we write 

Pfs') = f( r t>j- t| s ') 

The portion of received vector denoted by r ; > / _ 1 may be equivalently expressed as 
follows: 

r t>j - 1 = r , + l >j 
= ( r f r t >j) 
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Illustrating the 
computation of 
forward-metric and 
backward-metric 
recursions. 




Correspondingly, the backward metric flj(s') is reformulated as shown by 

fij(s’) = f(rj,r t>j \s') 


= X fU T f r t>j\ S ' ] 

SGa i + 1 

where <r / + 1 is the set of all states at time-unit j. Here again, using the definition of joint 
probability density function, we write 


fij(s') = ^ f(s,Tj,r t> j\s') 


u j + 1 


= I /(>',*, tv r ;>J ) 


u y+i 


= I p^/( r r >^',^r.)/(y'^,r.) 
s e °i + i 

To simplify matters, we note the following two points: 

Under the memoryless assumption, the received vector r t> j at the channel output 
depends only on the state in which the encoder was residing at j - 1 , namely s. We 
may, therefore, write 

f(r t >j\s',s,r j ) =/(r -Is) 


= A +1 (*) 


Invoking the definition of joint probability density function one more time, we have 
f(.s',s,Tj ) = f(s, 

= y j(s',s)P(s') 
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Accordingly, substituting the two results under points 1 and 2 into the formula for Pp') 
and canceling the common term P(j') we get 

Pp') = Yj yp’’ s ^j+ l( J ) 

s e °)+ i 

For obvious reasons, (10.78) is called the backward recursion; this second recursion is 
illustrated graphically in Figure 10.21b. 


Typically, the encoder starts in the all-zero state, denoted by 5 0 = 0. Correspondingly, the 
forward recursion of (10.77) begins operating at time-unit j = 0 under the following initial 
condition: 


« 0 (' y ) 


1, 5 = 0 

0, 5 it 0 


which follows from the fact that the convolutional encoder starts in the all-zero state. 
Thus, a, + 1(5) is recursively computed forward in time at j = 0, 1, . .., K- 1, where the 
overall length of the input data stream is 

K = L + L' 


in which L and L' denote the lengths of the message and termination sequences. 

Similarly, the backward recursion of (10.78) begins at time-unit j = K under the 
following initial condition: 


Pp s ) ~ 


1, 5 = 0 

0 , 5^0 


Since the encoder ends in the all-zero state, we recursively compute Pp’) backward in 
time at j = K- 1, K- 2, . 0. 


Thus far, we have accounted for all the issues important to the MAP decoder except for the 
discrete-input, continuous-output AWGN channel, which naturally comes into play in 
evaluating the branch metric: a necessary requirement. This issue was discussed in Example 
10 in Chapter 5. For this evaluation, we first rewrite the defining equation (10.74) as follows: 

Y p', s) = f(s, 1^5') 


= f P(5', 5 A _ S - r /h 

V P(5') ) V P(5', 5) J 

= P(5|5')/(r ; .|5', 5) 

which may be transformed into a more desirable form that involves the message bit nij and 
the corresponding code vector c j, as shown by 

Y/5',5) = P(mj)f(rpj) 
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Justification for this transformation may be explained as follows: 

The transition from the state s' = s- to the new state s = s. + j is attributed to the 
message bit inputing the convolutional encoder at time-unit /; hence, we may 
substitute the probability P(m.) for the conditional probability 

The state transition (s, s') may be viewed as another way of referring to the code 
vector Cp hence, we may substitute the conditional probability P(r ; |Cy) for 


In (10.81), nij is the message bit at the encoder’s input and c ; is the code vector defining 
the encoded bits pertaining to the state transition s' — > s at time-unit j. When this state 
transition is a valid one, the conditional probability density function /(r^ | c j), defining the 
input-output statistical behavior of the channel, assumes the following form: 


where E s is transmitted energy per symbol, n is the number of bits in each codeword, 
Nq/2 is the power spectral density of the additive white Gaussian channel noise, and 
fr.-cf is the squared Euclidean distance between the transmitted vector Cj at the 
channel input and the received vector r ; at the channel output at time-unit j. Thus, 
substituting (10.82) into (10.81) yields 


This equation holds if, and only if, the state transition s' — > s at time-unit j is a valid one; 
otherwise, the state-transition probability pis', s) is zero, in which case the branch metric 
y j(s', s) is also zero. 


At this point in the discussion, we are ready to revisit the a priori L-value L a (mj), 
introduced previously on page 624. Specifically, with the message bit nij taking the value 
+ 1 or -1, we may follow the format of (10.63) to define the a priori L-value of nij as 
follows: 


where, in the second line, we used the following axiom from probability theory: 


f( r j\ s ’ s ) ■ 




L a (mj) = In 


P ( m ■ = +1 ) 
In — — 


P(rrij = - 1 ) 



P (nij = +1) + P {m- = -1) = 1 


P(m^. = -1) = 1 - P (nij = +1) 


or, equivalently, 
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Solving the second line of (10.84) for P (irij = +1) in terms of the a priori L-value L a (mj), 
we get 


Correspondingly, 


P (rrij = +1) 


1 

1 + exp(-L a (w / )) 


P(nij = - 1 ) 


exp (-L a (mj)) 

1 + exp(-L a (m ; .)) 


This latter pair of equations for the two probabilities of m; = -1 and >n j = +1 may be 
combined into a single equation, as shown by 



exp(-L a (m / )/2) s 
1 + exp {-L a (mj))J 


exp 



where m j = ±1. The important point to note in (10.85) is that the first term on the right- 
hand side of the equation turns out to be independent of nij = ±1; hence, this term may be 
treated as a constant. 

Turning next to the exponential term in (10.83), we may express the exponent of the 
second term as follows: 


n q \\j j\\ 


Nr 


X ( r ji~ c ji> 

i = t 


N r 


X (r /Z “ 2 r jl C jl + C ]l^ 

I = 1 


s /N ,,2 . T 1 1 N 2. 

= -» 0 ( M - 2r >vlNI > 

where E s is the transmitted symbol energy, and the terms inside the parentheses are 


II II 2 v- f 

IN = X ( 0v) 

/ = i 


T 

r j °j 


n 


X r n c n 

i = i 


= XM 

l = 1 


n 


The terms r^ and Cjj denote the individual bits in the received vector r y and code vector Cj 

at time-unit /', and n denotes the number of bits in each of r ; and c ; . Note also that in 

'j’ J J 

(10.88) the term r ; . denotes the inner product of the vectors r ; and c ; . 
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In light of ( 10.87) to (10.89), we make three observations: 

ii 2 

The term (£' s /A^ 0 )|rJ depends only on the channel SNR and the squared 
magnitude of the received vector r.. 

J 2 

The third product term (£ s /IVq)||cJ depends only on the channel SNR and the 
squared magnitude of the transmitted code vector Cj. 

The remaining product term 2(E s /N 0 )rJ c. is the only one that contains useful 
information for detection in the receiver by virtue of the inner product r . c • that 
correlates the received vector r with the transmitted code vector c, as shown in (10.88). 

In light of these observations and the observation made previously that the bracketed 
fractional term in (10.85) does not depend on whether the symbol ny is +1 or -1, we may 
simplify the formula for the transition metric y .(s', s) in (10.83) as follows: 



where L c denotes the channel reliability factor, defined by 

4 E 


Lc N, 


o 


As for the two multiplying factors Aj and Bj, they are respectively defined by 


1 


-exp(-L a (/n.)) 

A: = f- . , , j = 0, 1, ..., L - 1 

J 1 + exp(-L a (m.)) 


and 


B i = 


\N nN o 


exp 


r 4; ( ii r J 2+ "> 


J = 0 , 1 , 1 


where, as before, n is the number of bits in each transmitted codeword. 

Equations (10.90), (10.92), and (10.93) apply to the message bits of length L. However, 
for the termination bits we have 


P(mj) = 1 and L a (m.) = ±oo, j = L, L + 1, . . ., K- 1 

for each valid state transition; the K in (10.94) denotes the combined length of the message 
and termination bits. Accordingly, (10.90) for the termination bits simplifies to 


7/0', s) = Bj exp 



j = L,L+l,...,K-l 


Examining (10.92), we find that the factor Aj is independent of the algebraic sign of 
message bit try, it is therefore a constant. Moreover, from (10.76) and the follow-up 
formulas of (10.77) and (10.78) for updating recursive computations of the forward and 
backward metrics, we find that the joint probability density function f(s', s, r) contains 
the factors 

L - 1 K- 1 

n Aj and u b j 

7=0 7=0 
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With these factors being common to every term in the numerator and denominator of 
(10.68), they both cancel out and may, therefore, be ignored. Thus, we may simplify 
(10.90) and (10.95) into the following two-part formula: 


7 j(s\ s) 


exp gm.L a (m.)J expQ/. c (rJ c jjj, j = 0, 1, .. L - 1 for message bits 

exp^L c (rJcy)j, j = L, L + l, K - l for termination bits 


One last comment is in order. When the original message bits are equally likely , we have 
P(/?iy) = ^ and L & (rrij) = 0 for all j 

Under these two conditions, we have a simple expression for the transition metric for the 
entire stream of bits, as shown by 

Yj(s',s) = expQi c (rJc ; .)), ./' = 0, 1, 1 


With the forward and backward recursions as well as the branch metric that ties them 
together all now at hand, we are equipped to finalize the formula for computing the a 
posteriori L- value L p (mj) defined way back in (10.68). Specifically, using (10.69) and 
(10.76), we may now write 




Y f( s j =s '’ s j+ l =s ’ r ) 


Y f( s r s ’> s j+ i =s ’ r ) 

,( s ' j ) £i; i 


Y /O', s, r) 

(.s', s) G Z+ 

Y /O', S, r) 

^ (s', *) e , 


Y Pj+i( s )Yj( s '’ s ) a j( s ') 


Y Pj+ tO)Y/C s',s)aj(s') 

K (s', S) g Zj 


It is the a posterior L-value L p (mj) defined in the last line of (10.99), which is delivered by 
the MAP decoding algorithm given the received vector r. 
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Starting with given AWGN channel values, namely E s /N 0 and received vector r ; at time-unit 
the computational flow diagram of Figure 10.22 provides a visual summary of the key 
recursions involved in using the MAP decoding algorithm. Specifically, the functional blocks 
pertaining to the forward metric a. + j ( s ) , the backward metric /A (.s'), the branch metric 
y j(s', s ) , and the a posteriori L-value L p (nij) are all identified together with their respective 
equation numbers. 


The MAP algorithm, credited to Bahl et al. (1974), is roughly three times as 
computationally complex as the Viterbi algorithm. It was on account of this high 
computational complexity that the MAP algorithm was largely ignored in the literature for 
almost two decades. However, its pioneering application in turbo codes by Berrou et al. 
(1993) re-ignited interest in the MAP algorithm, which, in turn, led to the formulation of 
procedures for significant reductions in computational complexity. 

Specifically, we may mention the following two modifications of the MAP algorithm, 
the first one being exact and the second one being approximate: 

Log-MAP Algorithm 

Examination of the forward and backward metrics of the MAP algorithm for 
continuous-output AWGN channels reveals that they are sums of exponential terms, 
one for each valid state transition in the trellis. This finding, in turn, leads to the idea 
of simplifying the MAP computations by making use of the following identity 
(Robertson et al., 1995): 

ln(e' A + e’ 1 ) = max(x, y) + ln( 1 + e ^ y ) 



Computational flow diagram displaying the key recursions in the MAP algorithm. 
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where the computationally difficult operation ln(e T + e- v ) is replaced by the sum of 
two simpler computations: 


The resulting algorithm, called the log-MAP algorithm, is considerably simpler in 
implementation and provides greater numerical stability than the original MAP 
algorithm. We say so because its formulation is based on two relatively simple 
entities: a max function and a look-up table. Note, however, that in developing the 
log-MAP algorithm, no approximations whatsoever are made. 

Max-log-MAP Algorithm 

We may simplify the computational complexity of the MAP decoding algorithm 
even further by ignoring the correction term ln( 1 + e altogether. In effect, we 
simply use the approximation 

ln(e A + e') ~ max(x, y) 

The correction term, ignored in this approximate formula, is bounded by 
0 < In ( 1 + e ^ _ ^ ) < In ( 2) = 0.693 

The approximate formula of (10.101) yields reasonably good results whenever the 
condition 

|max(x, y)| > 7 

holds. The decoding algorithm that uses the max function max(x,y) in place of 
ln(e x , e^) is called the max-log-MAP algorithm. In this simplified algorithm, the 
max function plays a role similar to the ACS described previously in the Viterbi 
algorithm; we therefore find that the forward recursion in the max-log-MAP 
algorithm is equivalent to a forward Viterbi algorithm, and the backward recursion 
in the max-log-MAP algorithm is equivalent to a Viterbi algorithm performed in the 
backward direction. In other words, computational complexity of the max-log-MAP 
algorithm is roughly twice that of the Viterbi algorithm, thereby providing a 
significant improvement in computational terms over the original MAP decoding 
algorithm. However, unlike the log-MAP algorithm, this improvement is attained at 
the expense of some degradation in decoding performance. 


To develop a detailed mathematical description of the max-log-MAP algorithm, we have to 
come up with simplified computations of the forward metric ctj + j(s) and backward metric 
J3j(s') , both of which play critical roles in computing the log-a-posteriori L-value L(nij) in 
(10.99). To this end, we introduce three new definitions in the log-domain: 

«*(.y') = Inals'), equivalently, a .(s') = exp(a*(,s')) 

PJ+ i(s) = In/3,- +1 (s), equivalently, fi j+] (s) = exp(/?* + i (s)) 

y](s',s) = lnyy(s', s), equivalently, y j(s',s) = exp(y 
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where the asterisk for all three metrics is intended to signify the use of natural logarithm 
and must, therefore, not be confused with complex conjugation. 

The motivation for these new definitions is to exploit the physical presence of exponen- 
tials in the forward and backward metrics so as to facilitate applying the approximate for- 
mula of (10.101). Thus, substituting the recursion of (10.104) into (10.77), we get 


« /+ lO) 


In 


Y Y j(s',s)aj(s') 


= In 


Y exp(y '*(s’,s) + a*(s')) 


where a- is a subset of £ .. Hence, application of the approximate formula of (10.101) 
yields 1 ‘ 


a,* + l(s)< 


max (Y/ ( s ’ s ) + «/(■*')), 


0 , 1 , 1 


Equation (10.106) indicates that, for each path in the trellis from the old state s' at time- 
unit j to the updated state .v at time-unit j + 1, the max-log MAP algorithm adds the branch 
metric y . (.v', .v) to the old value a* ( 5 ') to produce the updated value a* + j (s ) ; this update 
is the “maximum” of all the a* values of the previous paths terminating on the state 
Sj + j = s , that is, j = 1,0, . . . , K— 1. The process just described may be thought of as that 
of selecting the one particular path viewed as the “survivor” with all the other paths in the 
trellis reaching the state s being discarded. We may, therefore, view (10.106) as a 
mathematical basis for describing the forward recursion in the max-log-MAP algorithm in 
exactly the same way as the forward recursion in the Viterbi algorithm. 

Proceeding in a manner similar to that for the forward recursion, we may write 


P](s’) = In 


Y 7/( 5 '’ J )^+ l (s) 


"/ + 1 


In 


Y ex p(yJ<v, s ) + P]+ iO)) 


whose approximate form is given by 


4 V) ( 


v 1 


(y*(.?', s) + PJ + x (s)), j = K- 1, ..., 1, 0 


Next, proceeding onto the branch metric, we may similarly write the two-part formula 

j = 0, 1, 1 for message bits 




+ \ L c r j c r 


1 T 
(2 L c r ; C / 


j = L, L + l, K - l for termination bits 


1 


where, for the message bits in the first line of the equation, the additive term -mjL(rrij) 
accounts for a priori information. 
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At long last, using (10.105), (10.107), and (10.109) in (10.99), we may finally express 
the a posteriori L-value for the log-MAP algorithm as follows: 


L v (jnj) = In 


£ /3j +1 (s)Yj(s',s)aj(s') 


(s', $) G X 


^ fi j+l (s)Yj(s',s)aj(s') 


y (s', s) E 2 . 


In 


^ exp (P*+ r(s) + Y* («', s) + a* (s')) 


(s', s) E 1.J 


-In ^ exp(/?* + , (s) + -/'(.s’, ,v) + a* (s')) 

(s', s) E Zj 


A couple of reminders: 

• is the set of all state pairs Sj = s' and j . + j = s that correspond to the original 
message bit = +1 at time-unit j. 

• Yj is the set of all other state pairs j. = s' and s - + j = s that correspond to the 
original message bit m- = -1 at time-unit j. 

Correspondingly, the approximate form of the LJmj) in the max-log-MAP algorithm is 
defined by 


V"V : 


max 

s) G X 


XPJ+ r(s) +y 7 (s', s) + a*{s')) 


- max _(/?; +1 (s) + y 7 V,s) + a;(s')) 


(s' , i)eZ. 


Illustrative Procedure for Map Decoding in the Log-Domain 


In the preceding section we described three different algorithms for decoding a 
convolutional code, as summarized here: 

The BCJR algorithm, which distinguishes itself from the Viterbi algorithm in that it 
performs MAP decoding on a bit-by-bit basis. However, a shortcoming of this 
algorithm is its computational complexity, which, as mentioned previously, is 
roughly three times that of the Viterbi algorithm for the same convolutional code. 
The log-MAP-algorithm, which simplifies the BCJR algorithm by replacing the 
computationally difficult logarithmic operation, namely ln(e* + e' ) , with the so-called 
max function plus a look-up table for evaluating ln(l + e ^ ^) in accordance with 
(10.100). The attractive feature of this second algorithm is twofold: 

• transformation of the BCJR algorithm into the log-MAP algorithm is exact; 

• its computational complexity is twice that of the Viterbi algorithm, thereby pro- 
viding a significant reduction in complexity compared to the BCJR algorithm. 
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The max-log-MAP algorithm, which simplifies computational complexity even 
further by doing away with the look-up table; this simplification may result in some 
degradation in decoding performance depending on the application of interest. 

In this section, we illustrate how the simpler of the latter two algorithms, namely the max- 
log-MAP algorithm, is used to decode an RSC code by way of an example. 

Max-Log-MAP Decoding of Rate 3/8 Recursive Systematic Convolutional 
Code over AWGN Channel 

In this example, we revisit the simple RSC code discussed previously at the tail end of 
Section 10.6 on convolutional codes. 

For convenience of presentation, the two-state RSC encoder of Figure 10.17 is 
reproduced in Figure 10.23a. The message vector applied to the encoder is denoted by 


Input vector 










J ' 




AWGN channel 


Output vector 


£ 


Received vector 


(a) 


Received vector r = (+0.8, 0.1; 


+1.0.-0.5; -1.8, +1.1; +1.6.-1.6) 


Time-unit j 



Decoding Decoding Decoding Terminating 

phase 0 phase 1 phase 2 phase 


(b) 

(a) Block diagram of rate-3/8, two-state recursive systematic convolutional (RSC) 
encoder, (b) Trellis graph of the encoder. 
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which produces the encoded output vector 


c 




Correspondingly, the received vector at the channel output is denoted by 


r 


r (0) r (l) 
J ’ J 



The first three elements of the message vector m, namely m 0 , Wj, and m 2 , are message 
bits. The last element, ;n 3 , is a termination bit. With the encoded output vector, c, 
consisting of eight bits, it follows that the code rate r = 3/8. 

Figure 10.23b shows the trellis diagram of the RSC encoder. The underlying points 
covering the ways in which the branches of the trellis diagram have been labeled should be 
carefully noted: 


The encoder is initialized to the all-zero state and, on termination of the encoding 
process, it returns to the all-zero state. 

The encoder has a single memory unit; hence, there are only two states denoted by : 

S 0 represented by bit 0 and S i represented by bit 1. 

Figure 10.24 illustrates the four different ways in which the state transitions take place: 


S 0 to S Q :0/00 
S 0 to Sj: 1/11 
.S', to 5^0/01 
.S', to S 0 :l/10 

where, in each case, the first bit on the right-hand side is an input bit and the 
following two bits (shown separately) are encoded bits. Since the encoder is 
systematic, it follows that the encoder input bit and the first encoded bit are exactly 
the same. The remaining second encoded bit is determined by the modulo-2 
recursion: 


m j + h j_ j = bj, j = 0, 1, 2, 3 
where the initializing bit b_\ is 0. The two-bit code is defined by 

C. = (C (0) C (1) ) 

J 'J ’ J ’ 

= O j, bj), j = 0, 1, 2, 3 

We may thus use the notation m -/ cj ' ' to denote the branch labels. Hence, 
following this notation and the state transitions described in Figure 10.23b, we may 
identify the desired branch labels for the trellis diagram in terms of bits 0 and 1 , 
respectively. More specifically, using the mapping rule: levels -1 and +1 for bits 0 
and 1, respectively, we get the branch labels actually described in Figure 10.23b. 

One last point is in order: owing to the use of feedback in the encoder, the lower 
branch leaving each state does not necessarily correspond to a bit 1 (level +1) and 
the upper branch does not necessarily correspond to a bit 0 (level -1). 
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Input bit 
0 


Input bit 
1 


Input bit 
0 


Input bit 
1 


e- 


& 


& 


0 - 


Stored bit 


(a) 0/00 


Stored bit 


(b) 1/11 


Stored bit 


(c) 0/01 


Stored bit 




Encoded 

bits 




Encoded 

bits 




Encoded 

bits 




Encoded 

bits 


(d) 1/10 

Illustration of the operations involved in the four possible state transitions. 


To continue the background material for the example, we need to bring in a mapper that 
transforms the encoded signal into a form suitable for transmission over the AWGN channel. 
To this end, consider the simple example of binary PSK as the mapper. We may then express 
the SNR at the channel output (i.e., receiver input) as follows (see Problem 10.35): 


(SNR) 


channel output 
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where E b is the signal energy per message bit applied to the encoder input, and r is the code 
rate of the convolutional encoder. Thus for the SNR = 1/2, that is -3.01 dB and r = 3/8, the 
required E b /N 0 is 4/3. 

In transmitting the coded vector c over the AWGN environment, it is assumed that the 
received signal vector, normalized with respect to Je~ s , is given by 

r = ( +0. 8, 0.1 ; +1.0 , -0.5 ; -1. 8, 1.1 ; + 1.6 , -1.6 ) 

r 0 r l r 2 r 3 

The received vector r is included at the top of the trellis diagram in Figure 10.23b. 

We are now fully prepared to proceed with decoding the received vector r using the 
max-log-MAP algorithm described next, assuming the message bits are equally likely. 


Computation of the Decoded Message Vector 

To prepare the stage for this computation, we find it convenient to reproduce the following 
equations, starting with the formula for the log-domain transition metrics: 

7 * 0 ', 0 = Jcj), j = 0 , 1 ,..., K- 1 

Then for the log-domain forward metrics: 

max (y*(Y, s) + a* (s')), j = 0, 1, ...,/f-l 

J S' G <jT J J 

J 

Next, for the log-domain backward metrics: 

PKs ') = max(y *(s', s) + /1* +1 (s)) 

J S G <J. . J J 

]+ 1 

And finally for computation of the a posteriori /.-values: 


L p ( m j) = max + Pj+ iC' 5 ) + 7 j (s', s) + ctj O') 

(s,s') G Zj 


-max _ P* x O) + 1 ] (s', s) + a* (s') 

(y'ltt. J 1 J 

K ’ 1 

A Matlab code has been used to perform the computation, starting with the initial 
conditions for the forward and backward metrics, a 0 (s ) and J3 k (s') , defined in (10.79) 
and (10.80), respectively. The results of the computation are summarized as follows: 


Log-domain transition metrics 


Gamma 0 : 


7oOo’ So) 

< 

y*o(s 0 ,Sj) 
YiiSoSj 
y\(SvS 0 ) 

< 

7l(So,Si) 

Yiis^sj 


-0.9 


0.9 


-0.5 


1.5 

0.5 


-1.5 


Gamma 1 : 
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Gamma 2 : 


y 2 ( s o- s o ) = °- 7 

YaC-St- s o) = - 2 - 9 
y* 2 (S 0 , 5^ = -0.7 

y 2 ( s i> s i) = 2 - 9 


Gamma 3 : 


y*3(s 0 ^ 0 ) = o 

< 

Y l(S P S 0 ) = 3.2 


Log-domain forward metrics 


Alpha 0 : 
Alpha 1 : 
Alpha 2 : 

Log-domain backward metrics 

Pk- 

Beta 3 : 
Beta 2 : 
Beta 1 : 


«S(S 0 ) 

«o(Sl) 

a\(S 0 ) 

«I(S t) 

a* 2 (S 0 ) 

4(so 

P K (s 0 ) 

Pk(S t) 

Pt(S Q ) 

p&l) 

P* 2 (S 0 ) 

/I(S 0 ) 


0 

0 


-0.9 


0.9 

2.4 


-0.4 


0 

0 

0 

3.2 

2.5 
6.1 

6.6 
4.6 
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A posteriori L-values 

L p ( m o ) = -°- 2 

L v (m x ) = 0.2 . 

V m 2) = -°- 8 . 

Final decision 

Decoded version of the original message vector 

m = [-1, 1,-1] 

In binary form, we may equivalently write 

m = [0, 1,0] 


In arriving at the decoded output of (10.1 19) we made use of the termination bit, my 
Although m 3 is not a message bit, the same procedure was used to calculate its a 
posteriori /.-value. Lin and Costello (2004) showed that this kind of calculation is a 
necessary requirement in the iterative decoding of turbo codes. Specifically, with the 
turbo decoder consisting of two stages, “soft-output” a posteriori L-values are 
passed as a priori inputs to a second decoder. 

In Example 7, we focused attention on the application of the max-log-MAP 
algorithm to decode the rate-3/8 RSC code produced by the two-state encoder of 
Figure 10.23a. The procedure described herein, embodying six steps, applies 
equally well to the log-MAP algorithm with no approximations. In Problem 10.34 at 
the end of the chapter, the objective is to show that the corresponding decoded 
output is (+1, +1, -1), which is different from that of Example 7. Naturally, in 
arriving at this new result, the calculations are somewhat more demanding but more 
accurate in the final decision-making. 


New Generation of Probabilistic Compound Codes 


Traditionally, the design of good codes has been tackled by constructing codes with a great 
deal of algebraic structure, for which there are feasible decoding schemes. Such an 
approach is exemplified by the linear block codes, cyclic codes, and convolutional codes 
discussed in preceding sections of this chapter. The difficulty with these traditional codes 
is that, in an effort to approach the theoretical limit for Shannon’s channel capacity, we 
need to increase the codeword length of a linear block code or the constraint length of a 
convolutional code, which, in turn, causes the computational complexity of a maximum 
likelihood or maximum a posteriori decoder to increase exponentially. Ultimately, we 
reach a point where complexity of the decoder is so high that it becomes physically 
impractical. 

Ironically enough, in his 1948 paper. Shannon showed that the “average” performance 
of a randomly chosen ensemble of codes results in an exponentially decreasing decoding 
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error with increasing block length. Unfortunately, as it was with his coding theorem. 
Shannon did not provide guidance on how to construct randomly chosen codes. 


Interest in the use of randomly chosen codes was essentially dormant for a long time until 
the new idea of turbo coding was described by Berrou et al. (1993); that idea was based on 
two design initiatives: 

The design of a good code, the construction of which is characterized by random- 
like properties. 

The iterative design of a decoder that makes use of soft-output values by exploiting 
the maximum a posteriori decoding algorithm due to Bahl et al. (1974). 

By exploiting these two ideas, it was experimentally demonstrated that turbo coding can 
approach the Shannon limit at a computational cost that would have been infeasible with 
traditional algebraic codes. Therefore, it can be said that the invention of turbo coding 
deserves to be ranked among the major technical achievements in the design of 
communication systems in the 20th century. 

What is also remarkable is the fact that the discovery of turbo coding and iterative 
decoding flamed theoretical as well as practical interest in some prior work by Gallager 
(1962, 1963) on LDPC codes. These codes also possess the information-processing power 
to approach the Shannon limit in their own individual ways. The important point to note 
here is the fact that both turbo codes and LDPC codes are capable of approaching the 
Shannon limit at a similar level of computational complexity, provided that they both have 
a sufficiently long codeword. Specifically, turbo codes require a long turbo interleaver, 
whereas LDPC codes require a longer codeword at a given code rate (Hanzo, 2012). 

We thus have two basic classes of probabilistic compound coding techniques', turbo 
codes and LDPC codes, which complement each other in the following sense: 


With these introductory remarks, the stage is set for the study of turbo codes in Section 
10.12, followed by LDPC codes in Section 10.14. 


Turbo Codes 


As mentioned in the preceding section, the use of a good code with random-like properties is 
basic to turbo coding. In the first successful implementation of turbo codes , Berrou et al. 
achieved this design objective by using concatenated codes. The original idea of 
concatenated codes was conceived by Forney (1966). To be more specific, concatenated 
codes can be of two types: parallel or serial. The type of concatenated codes used by 
Berrou et al. was of the parallel type, which is discussed in this section. Discussion of the 
serial type of concatenated codes will be taken up in Section 10.16. 
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Block diagram of turbo encoder of the parallel type. 


Figure 10.25 depicts the most basic form of a turbo code generator that consists of two 
constituent systematic encoders, which are concatenated by means of an interleaver. 

The interleaver is an input-output mapping device that permutes the ordering of a 
sequence of symbols from a fixed alphabet in a completely deterministic manner; that is, it 
takes the symbols at the input and produces identical symbols at the output but in a different 
temporal order. Turbo codes use a pseudo-random interleaver , which operates only on the 
systematic (i.e., message) bits. (Interleavers are discussed in Appendix F.) The size of the 
interleaver used in turbo codes is typically very large, on the order of several thousand bits. 
There are two reasons for the use of an interleaver in a turbo code: 

The interleaver ties together errors that are easily made in one half of the turbo code 
to errors that are exceptionally unlikely to occur in the other half; this is indeed one 
reason why the turbo code performs better than a traditional code. 

The interleaver provides robust performance with respect to mismatched decoding, a 
problem that arises when the channel statistics are not known or have been 
incorrectly specified. 

Ordinarily, but not necessarily, the same code is used for both constituent encoders in 
Figure 10.25. The constituent codes recommended for turbo codes are short constraint- 
length RSC codes. The reason for making the convolutional codes recursive (i.e., feeding 
one or more of the tap outputs in the shift register back to the input) is to make the internal 
state of the shift register depend on past outputs. This affects the behavior of the error 
patterns, with the result that a better performance of the overall coding strategy is attained. 

Two-State Turbo Encoder 

Figure 10.26 shows the block diagram of a specific turbo encoder using an identical pair of 
two-state RSC constituent encoders. The generator matrix of each constituent encoder is 
given by 

G < D > = ('• rb) 

The input sequence of bits has length K = 4 . made up of three message bits and one 
termination bit. (This RSC encoder was discussed previously in Section 10.9.) The input 
vector is given by 


m = (wq, fflj, ffl 2 , W3) 
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Input vector 



( 0 ) 

(l) 


( 2 ) 


Turbo 
. encoded 
vector 
c 


Two-state turbo encoder for Example 8. 


The parity-check vector produced by the first constituent encoder is given by 


,(1) 




Similarly, the parity-check vector produced by the second constituent encoder is given by 

b (2) = (b?\bf\b?\bfh 

The transmitted code vector is therefore defined by 

, (0) (1) (2). 
c = (c w , c v , c 0 

With the convolutional code being systematic, we thus have 

(0) 

c = m 

As for the remaining two sub-vectors constituting the code vector c, they are defined by 


O) 


Jl) 


and 


The transmitted code vector c is therefore made up of 12 bits. However, recalling that the 
termination bit m 3 is not a message bit, it follows that the code rate of the turbo code 
described in Figure 10.26 is 

r = 3_ = 1 
’ 12 4 

One last point is in order: with each RSC encoder having two states, the interleaver has a 
two-by-two (row-column) structure. Note also that the interleaver in Figure 10.26 is 
denoted by the symbol n, which is a common usage; this practice is adopted throughout 
the book. 


In Figure 10.25, the input data stream is applied directly to encoder 1 and the pseudo- 
randomly reordered version of the same data stream is applied to encoder 2. The 
systematic bits (i.e., original message bits) and the two sets of parity-check bits generated 
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by the two encoders constitute the output of the turbo encoder. Although the constituent 
codes are convolutional, in reality, turbo codes are block codes with the block size being 
determined by the periodic size of the interleaver. Moreover, both RSC encoders in Figure 
10.25 are linear. We may therefore describe turbo codes generally as linear block codes. 

The block nature of the turbo code raises a practical issue: 


The common practice is to initialize the encoder to the all-zero state and then encode the 
data. After encoding a certain number of data bits, a number of tail bits are added so as to 
make the encoder return to the all-zero state at the end of each block; thereafter, the cycle 
is repeated. The termination approaches of turbo codes include the following: 

• A simple approach is to terminate the first RSC code in the encoder and leave the 
second one undetermined. A drawback of this approach is that the bits at the end of 
the block due to the second RSC code are more vulnerable to noise than the other 
bits. Experimental work has shown that turbo codes exhibit a leveling off in 
performance as the SNR increases. This behavior is not like an error floor; rather, it 
has the appearance of an error floor compared with the steep drop in error 
performance at low SNR. The error floor is affected by a number of factors, the 
dominant one of which is the choice of interleaver. 

• A more refined approach is to terminate both constituent codes in the encoder in a 
symmetric manner. Through the combined use of a good interleaver and dual 
termination, the error floor can be reduced by an order of magnitude compared to 
the simple termination approach. 

In the original version of the turbo encoder described in Berrou et al. (1993), the parity- 
check bits generated by the two encoders in Figure 10.25 were punctured prior to data 
transmission over the channel to maintain the rate at 1/2. A punctured code is constructed 
by deleting certain parity-check bits, thereby increasing the data rate; the message bits in 
the puncturing process are of course unaffected. Basically, puncturing is the inverse of 
extending a code, ft should, however, be emphasized that the use of a puncture map is not 
a necessary requirement for the generation of turbo codes. 

As mentioned previously, the encoding scheme of Figure 10.25 is of the parallel 
concatenation type, the novelty of which is twofold: 

• the use of RSC codes and 

• the insertion of a pseudo-random interleaver between the two encoders. 

The net result of parallel concatenation is a turbo code that appears essentially random to 
the channel by virtue of the pseudo-random interleaver, yet it possesses sufficient structure 
for the decoding to be physically realizable. Coding theory asserts that a code chosen at 
random is capable of approaching Shannon’s channel capacity, provided that the block 
size is sufficiently large. This is indeed the reason behind the impressive performance of 
turbo codes, as discussed next. 


Figure 10.27 shows the error performance of a 1/2-rate turbo code with a large block size 
for binary data transmission over an AWGN channel. The code uses an interleaver of 
size 65,536 bits and a MAP decoder. 
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Noise performance of 1/2 rate, turbo code and uncoded transmission 
for AWGN channel; the figure also includes Shannon’s theoretical limit on channel 
capacity for code rate r = 1/2. 


For the purpose of comparison, Figure 10.27 also includes two other curves for the 
same AWGN channel: 

• uncoded transmission (i.e., code rate r = 1); 

• Shannon’s theoretical limit for code rate 1/2, which follows from Figure 5.18b. 

From Figure 10.27, we may draw two important conclusions: 

Although the BER for the turbo-coded transmission is significantly higher than that 
for uncoded transmission at low E b /N 0 , the BER for the turbo-coded transmission 
drops very rapidly once a critical value of E b /N$ has been reached. 

At a BER of 10~ 5 , the turbo code is less than 0.5 dB from Shannon’s theoretical 
limit. 

Note, however, attaining this highly impressive performance requires that the size of the 
interleaver or, equivalently, the block length of the turbo code be large. Also, the large 
number of iterations needed to improve performance increases the decoder latency. This 
drawback is due to the fact that the digital processing of information does not lend itself 
readily to the application of feedback, which is a distinctive feature of the turbo decoder. 


Before proceeding to describe the operation of the turbo decoder, we find it desirable to 
introduce the notion of extrinsic information. The most convenient representation for this 
new concept is in terms of the log-likelihood ratio, in which case extrinsic information is 
computed as the difference between two a posteriori L-values as depicted in Figure 10.28. 
Formally, extrinsic information, generated by a decoding stage for a set of systematic 
(message) bits, is defined as follows: 
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Block diagram for illustrating the concept of extrinsic information. 


In effect, extrinsic information is the incremental information gained by exploiting the 
dependencies that exist between a message bit of interest and incoming raw data bits 
processed by the decoder. Extrinsic information plays a key role in the iterative decoding 
process, as discussed next. 


Figure 10.29a shows the block diagram of the two-stage turbo decoder. Using a MAP 
decoding algorithm discussed in Section 10.9, the decoder operates on noisy versions of 
the systematic bits and the two sets of parity-check bits in two decoding stages to produce 
an estimate of the original message bits. 

A distinctive feature of the turbo decoder that is immediately apparent from the block 
diagram of Figure 10.29a is the use of feedback, manifesting itself in producing extrinsic 
information from one decoder to the next in an iterative manner. In a way, this decoding 
process is analogous to the feedback of exhaust gases experienced in a turbo-charged 
engine; indeed, turbo codes derive their name from this analogy. In other words, the term 
“turbo” in turbo codes has more to do with the decoding rather than the encoding process. 

In operational terms, the turbo encoder in Figure 10.29a operates on noisy versions of 
the following inputs, obtained by demultiplexing the channel output, ij , 

• systematic (i.e., message) bits, denoted by r*°* ; 

• parity-check bits corresponding to encoder 1 in Figure 10.25, denoted by rj 1 * ; 

72) 

• parity-check bits corresponding to encoder 2 in Figure 10.25, denoted by r . . 

The net result of the decoding algorithm, given the received vector r . , is an estimate of the 
original message vector, namely m , which is delivered at the decoder output to the user. 

Another important point to note in the turbo decoder of Figure 10.29a is the way in 
which the interleaver and de-interleaver are positioned inside the feedback loop. Bearing 
in mind the fact that the definition of extrinsic information requires the use of intrinsic 
information, we see that decoder 1 operates on three inputs: 

• the noisy systematic (i.e., original message) bits, 

• the noisy parity-check bits due to encoder 1 , and 

• de-interleaved extrinsic information computed by decoder 2. 
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Estimate of message vector 


(a) 


Close switch at time-unit j = 0 and 
set L 2 (trij) = 0 



Decoding stage 1 Decoding stage 2 


(b) 

(a) Block diagram of turbo decoder, (b) Extrinsic form of turbo decoder, where I stands 
for interleaver, D for deinterleaver, and BCJR for BCJR for BCJR algorithm for log-MAP decoding. 


In a complementary manner, decoder 2 operates on two inputs of its own: 

• the noisy parity-check bits due to encoder 2 and 

• the interleaved version of the extrinsic information computed by decoder 1 . 

For this iterative exchange of information between the two decoders inside the feedback 
loop to continuously reinforce each other, the de-interleaver and interleaver would have to 
separate the two decoders in the manner depicted in Figure 10.29a. Moreover, the 
structure of the decoder in the receiver is configured to be consistent with the structure of 
the encoder in the transmitter. 


To put the two-state turbo decoding process just described on a mathematical basis, we 
structure the flow of information around the feedback loop as depicted in Figure 10.29a. 
For the sake of simplicity without loss of generality, we assume the use of a code rate 
r = 1/3 parallel concatenated convolutional code without puncturing. At time-unit j, let 

r* ° * denote the noisy vector of systematic bits. 
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r* 1 1 denote the noisy vector of parity-check bits produced by encoder 1, and 

72) 

r j denote the noisy vector of parity-check bits produced by encoder 2. 

The notations described herein are consistent with those adopted in the encoder of Figure 

( 0 ) ( 1 ) ( 2 ) 

10.25. Moreover, it is assumed that all three vectors, rj , r . , r ■ , are of 

dimensionality K. 

Proceeding with the analysis, decoder 1 in Figure 10.29b uses the BCJR decoding 
algorithm to produce a “soft estimate” of symmetric bit mj by computing the a posteriori 
L- values for decoder 1 , namely 
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where L 2 ( m) denotes the extrinsic information about the message vector m that is 
computed by decoder 2. Note also that in (10. 120) we have used the usual mapping: +1 for 
bit 1 and -1 for bit 0. Assuming that the L message bits are statistically independent, the 
overall extrinsic information computed by decoder 1 is given by the summation: 

M m ) = ^ L^irij) 

2 = 0 


Accordingly, the extrinsic information about the message vector m computed by decoder 1 
is given by the difference 

Z^m) = L 1 (m)-L2(m) 
where L 2 ( m ) is to be defined. 

Before proceeding to use (10.122) in the second decoding stage, the extrinsic 
information Ti(m) is reordered (i.e., de-interleaved) to compensate for the pseudo-random 
interleaving introduced originally in the turbo encoder in the manner indicated in both 
Figure 10.29b. In addition to Lj(m) , the input applied to decoder 2 also includes the vector 
of noisy parity-check bits Accordingly, by using the BCJR algorithm, decoder 2 
produces a more refined soft estimate of the message vector m. Next, as indicated in Figure 
10.29b, this refined estimate of the message vector is re- interleaved to compute the a 
posteriori L - values for decoder 2, namely 


K- 1 

Mm) = ^ L 2 ( m j) 

2 = 0 


where 
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r P (m, = +1 |r- 2) , 
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j = 0, 1, 1 


Accordingly, the extrinsic information C 2 ( m ) fed back to the input of decoder 1 is given by 


Z,2( m ) = Z, 2 ( m ) — Z.J (m) 

and with it the feedback loop, embodying constituent decoders 1 and 2, is closed. 
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As indicated in Figure 10.29b, the decoding process is initiated by setting the a 
posteriori extrinsic L-value 

L 2 (nij) = 0, for j = 0 

The decoding process is stopped when it reaches a point at which no further improvement 
in performance is attainable. At this point, an estimate of the message vector m is 
computed by hard-limiting the a priori L-value at the output of decoder 2, yielding 

m = sgn(L 2 (m)) 

To conclude the discussion on turbo decoding, two other points are noteworthy: 

Although the noisy vector of systematic bits r <0) is applied only to decoder 1, its 
influence on decoder 2 manifests itself indirectly through the a posteriori extrinsic 
L-value, Li(m) , computed by decoder 1. 

Equations (10.122) and (10.125) assume that the a posteriori extrinsic L-values, 
Lj(m) and L 2 (m) , passed between decoders 1 and 2, are statistically independent 
of the message vector m. In reality, however, this condition applies only to the first 
iteration of the decoding process. Thereafter, the extrinsic information becomes less 
helpful in realizing successively more reliable estimates of the message vector m. 

UMTS Codec Using Binary PSK Modulation 

In this example, we study the Universal Mobile Telecommunications Systems (UMTS) 
standard’s codec. To simplify the study, binary PSK modulation is used for data transmission 
over an AWGN channel. The basic RSC encoder of the UMTS turbo codes is as follows: 

code-rate r = 1/3 

constraint length v = 4 

memory length m = 3 


Figure 10.30a shows the block diagram of the UMTS turbo encoder, which consists of two 
identical concatenated RSC encoders, operating in parallel with an interleaver separating 
them. To be specific: 

• Each encoder is made up of a linear feedback shift register (LFSR) whose number of 
flip-flops m = 3; in each LFSR, therefore, we have a finite-state machine with 

2 m = 2 3 = 8 states 

• The encoding process is initialized by setting each LFSR to the all-zero state. 

• To activate the encoding process, the two switches in Figure 10.30a are closed, 
thereby applying the message vector m to the top RSC encoder and applying the 
interleaved version of m, namely n, to the bottom RSC encoder. The length of ni is 
denoted by K. 

• Each RSC constituent encoder produces a sequence of parity-check bits, the length 
of which is K + m. 

• Once the encoding process is completed, a set of m bits is appended to each block of 
encoded bits, so as to force each LFSR back to the initial all-zero state. 


654 


Error-Control Coding 



Encoded 

output 
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(a) 



Lp(r (0) ) 

Lp(z (1) ) 
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r (0) ' 
z(D 


Noisy 
channel 
> output 
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zP) 
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Block diagram of UMTS codec, (a) Encoder, (b) Decoder. Notes: 1. The received 
vectors {r®, z®, r®, z®, r®} correspond to the transmitted vectors {c®, t®, c®, t®, c®}. 
2. The block labeled 7t : interleaver. The block labeled 7t 1 : de-interleaver. 


Turbo Codes 


655 


From this description, it is apparent that the overall code-rate of the turbo code is lower 
than the UMTS code-rate, namely 1/3, as shown by 

K 

^overall 3K + 4m 

Note that if we set the memory length m = 0, the code rate i' overa u is increased again to 1/3. 

On the basis of this description, each block of the multiplexed output of the turbo 
encoder is composed as follows: 

c^ 0) vector of systematic bits (i.e., message bits), followed by 

c ( 1 * and c (2) pair of vectors, representing the parity-check bits produced by the 
top and bottom RSC encoders, respectively, then followed by 
t ( 1 * and t (2 ' pair of vectors, representing encoder termination-tail bits for forcing 
the top and bottom RSC constituent encoders back to all-zero state, 
respectively. 

In the UMTS standard, the block length of the turbo code lies in the range [40, 5 1 14], 


Figure 10.30b shows a block diagram of the UMTS decoder. Specifically, proceeding from 
top to bottom on the right-hand side of the figure, we have five sequences of a posteriori L- 
values computed in the receiver, namely L p (c*^), L p (t^), L p (c (1 ^), L p (t u '), and Lp/c* 2 - 1 ); 
these T-values correspond to the encoded sequences c (1 \ t (2 \ and c (2) , respectively. 

Considering, first, how decoder 1 operates in the receiver, we find from Figure 10.30b 
that it receives two input sequences of /.-values, the first one of which, namely the a 
posteriori /.-value /. p (c ( | J ), comes directly from the channel. The other input, the a priori 
L - value denoted by L a j, is made up of three components: 

The a posteriori /.-value, /. p (c <0 *), which accounts for the received systematic bits, c (()) . 

The reordered version of the extrinsic information produced by decoder 2, resulting 
from the de-interleaver n . 

The a posteriori L- value, L p (t ( 1 *) attributed to the systematic vector of termination 
bits, t (1 \ which is appended to the sum of components 1 and 2 to complete L a j. 

In a corresponding but slightly different way, decoder 2 receives two input sequences of L- 
values, the first one of which, namely the a posteriori L - value L p ( c (2) ), comes directly from 
the channel. The other input, a priori /.-value L. d 2 , is also made up of three components: 

The reordered version of a posteriori /.-value, L p (c^) is due to the received vector 
of systematic bits, c (0 \ where the reordering is produced by the interleaver to the left 
of the de-interleaver 7t _ . 

The reordered version of extrinsic information is produced by decoder 1, where 
the reordering is performed by the second interleaver, n, to the right of the de- 
interleaver n\ 

The a posteriori L-value, L p (t (2 ^) is attributed to the systematic vector of 
termination bits, t (2) ; this time, however, T p (t <2 ^) is removed before it is interleaved 
and passed to decoder 2. 
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In Figure 10.31, we have plotted the BER chart for the iterative decoding process, using 
the turbo codec of Figure 10.30. The results were obtained for the case of 5000 systematic 
bits, as follows: 

• For a prescribed E b /N b ratio, the bit errors were averaged over 50 Monte Carlo runs. 

• Each point in the BER chart was the result of 100 bits per point-count in the 
decoding process. 

• The computations were repeated for different values of E b /N b . 

The remarkable points to observe from Figure 10.31 are summarized here: 

In the course of just four iterations, the BER of the UMTS decoder drops to 1 0 1 4 at 
an SNR = 3 dB, which, for all practical purposes, is zero. 

The steepness of the BER plot on iteration 4 is showing signs of the turbo cliff, but is 
not there yet. Unfortunately, to get there would require a great deal more 
computation. (The turbo cliff is illustrated in Figure 10.32 in the next section). 


For a rather rudimentary but plausible approach, to address the issue of computational 
complexity in a fair-minded way, consider a convolutional code that has m = 6 states and, 
therefore, requires 

2 6 = 64 

ACS operations for Viterbi decoding. 

To match this computational complexity, using the turbo decoder of Figure 10.29b with 
16 ACS operations, we need the following number of decoding iterations: 

61 = 4 
16 


5000-bit UMTS turbo code with BPSK modulation in an AWGN channel 


The bit error rate (BER) diagram 
for the UMTS-turbo decoder, 
using 5000 systematic bits and 
-3 dB SNR. 



SNR (in dB) 
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Correspondingly, Figure 10.3 1 plots the BER chart for the turbo decoder for the sequences 
of decoding iterations: 1, 2, 3, and 4. Of special interest is the BER chart for four 
iterations, for which we find that the turbo decoder with BER *10 outperforms the 
Viterbi decoder significantly for the same computational complexity, namely a total of 64 
ACS operations. 


In an idealized BER chart exemplified by that in Figure 10.32, we may identify three 
distinct regions, described as follows: 

Low BER region, for which the E^/Nq ratio is correspondingly low. 

Waterfall region , also referred to as the turbo cliff in the turbo coding literature, 
which is characterized by a persistent reduction in BER over the span of a small 
fraction of dB in SNR. 

BER floor region, where a rather small improvement in decoding performance is 
achieved for medium to large values of SNRs. 

As informative as the BER chart of Figure 10.32 is, from a practical perspective it has a 
serious drawback. Simply put, the BER chart lacks insight into the underlying dynamics 
(i.e., convergence behavior) of iterative decoding algorithms, particularly around the 
turbo-cliff region. Furthermore, since the BER occurs at low BERs, excessive simulation 
runs are required. 

The question is: how do we overcome this serious drawback of the BER chart? The 
answer lies in using the extrinsic information chart, or EXIT chart for short, which was 
formally introduced by ten Brink (2001). 

The EXIT chart is insightful because it provides a graphical procedure for visualizing 
the underlying dynamics of the turbo decoding process for a prescribed E^/Nq. Moreover, 
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Idealized BER chart for turbo decoding. 
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the procedure provides a tool for the design of turbo codes characterized by good 
performance in the turbo-cliff region. In any event, development of the EXIT chart 
exploits the idea of mutual information in Shannon’s information theory, which was 
discussed previously in Chapter 5. 


Consider a constituent decoder in the turbo decoder of Figure 10.29b, which is, for 
convenience of presentation, labeled decoder 1 ; the other constituent decoder is labeled 
decoder 2. Let L.Jnij)) denote the mutual information between a transmitted message 
bit nij and the a priori L-value L a {mf) for a prescribed E^/Nq. Correspondingly, let L p (nij) 
denote the a posteriori L-value of message bit nij and let Ifni', L p {mf)) denote the mutual 
information between rrij and L p (mj) for the same E^/Nq. Then, with Iffnf L p (my)) viewed as 
a function f L a (mf), we may express the extrinsic information transfer characteristic 
of constituent decoder 1 for some operator 7’( • j and the prescribed E^/Nq as follows: 

L p(Hj)) = TUfmp L a (mj )) 

In the continuum, it is shown that both mutual informations, f and / 2 , lie within the range 
[0,1]. Thus, a plot of I^jnf, LJfnjf) versus Ifnif L a {mj)\ depicted in Figure 10.33a, displays 
graphically the extrinsic information transfer characteristic of the constituent decoder 1 . 

Since the two constituent decoders are similar and they are connected together 
sequentially inside a closed feedback loop, it follows that the extrinsic information transfer 
characteristic of constituent decoder 2 is the mirror image of the curve in Figure 10.33a 
with respect to the straight line f = / 2 , as shown in Figure 10.33b. With this relationship in 
mind, we may go on to put the transfer characteristic curves of the two constituent 
decoders side by side, but keeping the same horizontal and vertical axes of Figure 10.33a. 
We thus get the composite picture depicted in Figure 10.33c. In effect, this latter figure 
represents the input-output extrinsic transfer characteristic of the two constituent decoders 
working together in a turbo-decoding algorithm for the prescribed E^/Nq . 

To elaborate on the practical utility of Figure 10.33a, suppose that the iterative turbo- 
decoding algorithm begins with /j = 0 , representing the initial condition of constituent 
decoder 1 for the first iteration in the decoding process. Then, in proceeding forward, we 
keep the following two points in mind: 

• First, the a posteriori L-value of constituent decoder 1 becomes the a priori L-value 
of constituent decoder 2 , and similarly when these two decoders are interchanged, as 
we proceed from one iteration to the next. 

• Second, the message bits mj, m 2 , m 3 , ... occur on consecutive iterations. 

Hence, we will experience the following sequence of extrinsic information transfers 
between the two constituent decoders from one message bit to the next one for some 
prescribed E b /N 0 : 

Initial condition : f * \m j ) = 0. 

Iteration 1: message bit, 

Decoder 1: l[ l \mf) defines I^(mf ) . 

Decoder 2: I^(mf) initiates for iteration 2. 
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BPSK modulation in an AWGN channel 



(a) Extrinsic information transfer characteristic of decoder 1; (b) Extrinsic information 
transfer characteristic of decoder 2; (c) Input-output extrinsic transfer characteristic of the two 
constituent decoders working together; (d) EXIT chart, including the staircase (shown dashed) 
embracing the extrinsic information transfer characteristics of both constituent decoders. 


Iteration 2: message bit, m 2 

Decoder 1: defines . 

( 0 ) ( 3 ) 

Decoder 2: ( m ) initiates 1^ (m 3 ) for iteration 3. 

Iteration 3: message bit, m 3 

Decoder 1: /[ 3 \m 3 ) defines I^\m^) . 

Decoder 2: I^\m^) initiates l\ 4 \m 4 ) for iteration 4. 
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Iteration 4: message bit, m 4 

Decoder 1: l[ 4 \m 4 ) defines li, 4 \m 4 ) . 

(4) (4) 

Decoder 2: I) (m 4 ) initiates /j (m 5 ) for iteration 5. 
and so on. 

Proceeding in the way just described, we may construct the EXIT chart illustrated in Figure 
10.33d, which embodies a trajectory that moves from one constituent decoder to the other in 
the form of a staircase. Specifically, the extrinsic information transfer curve from constituent 
decoder 1 to constituent decoder 2 proceeds in a horizontal manner and, by the same token, 
the extrinsic information transfer curve from constituent decoder 2 to constituent decoder 1 
proceeds in a vertical manner. Hereafter, construction of the sequence of extrinsic 
information transfer curves from one constituent decoder to another is called the staircase- 
shaped extrinsic information transfer trajectory between constituent decoders 1 and 2. 

Examination of the EXIT chart depicted in Figure 10.33d prompts us to make the 
following two observations: 

Provided that the SNR at the channel output is sufficiently high, then the extrinsic 
information transfer curve of constituent decoder 1 stays above the straight line 
1 1 = 1 2 , while the corresponding extrinsic information transfer curve of constituent 
decoder 2 stays below this line. It follows, therefore, that an open tunnel exists 
between the extrinsic information transfer curves of the two constituent decoders. 
Under this scenario, the turbo-decoding algorithm converges to a stable solution for 
the prescribed E^/Nq. 

The estimates of extrinsic information in the turbo-decoding algorithm continually 
become more reliable from one iteration to the next as the stable solution is 
approached. 

If, however, in contrast to the picture depicted in Figure 10.33d, no open tunnel exists 
between the extrinsic information transfer curves of constituent decoders 1 and 2 when the 
prescribed E^/Nq is relatively low, then the turbo-decoding algorithm fails to converge 
(i.e., the turbo-decoding algorithm is unstable). This behavior is illustrated in the EXIT 
chart of Figure 10.34 where the SNR has been reduced compared to that in Figure 10.33. 


EXIT chart demonstrating nonconvergent 
behavior of the turbo decoder when the E^/Nq is 
reduced compared to that in Figure 10.33d. 


BPSK modulation in an AWGN channel 
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The stage is now set for us to introduce the following statement: 


It is the graphical simplicity of this important statement that makes the EXIT chart such a 
useful practical tool in the design of iterative decoding algorithms. 

Moreover, if it turns out that the EXIT curves of the two constituent decoders do not 
intersect before the (1, l)-point of perfect convergence and the staircase-shaped decoding 
trajectory also succeeds in reaching this critical point, then a vanishingly low BER is 
expected (Hanzo, 2012). 


For an approximate model needed to display the underlying dynamics of iterative 
decoding algorithms, the first step is to assume that the a priori /..-values for message bit 
m.j, namely L a (mj), constitute independent Gaussian random variables. With mj = ±1, the 
L a (nij) assumes a variance a a and a mean value ( cfj/2) nij . Equivalently, we may express 
the statistical dependence of L a on /«, as follows: 


L a ( m j) = 




nij + n a 


2 

where n a is the sample value of a zero-mean Gaussian random variable with variance cr“ . 

The rationale for the approximate Gaussian model just described is motivated by the 
following two points (Lin and Costello, 2003): 


For an AWGN channel with soft (i.e., unquantized) output, the log-likelihood ratio, 
L- value, denoted by LJ of a transmitted message bit nij given the receiver 
signal may be modeled as follows (see Problem 10.36): 

L -S m j\ r j ) = L c r j + L J m j) 

where L c = 4(E S /N 0 ) is the channel reliability factor defined in (10.91) and L a (mj) is 
the a priori L-value of message bit nij. The point to note here is that the product 
terms L c ry (0) for varying j are independent Gaussian random variables with variance 
2 L c and mean ±L C . 

Extensive Monte Carlo simulations of the a posteriori extrinsic L- values, LJ/iij), for 
a constituent decoder with large block length appear to support the Gaussian-model 
assumption of (10.129); see Wiberg et al. (1999). 

Accordingly, using the Gaussian approximation of (10.129), we may express the 
conditional probability density function of the a priori L-value as follows: 


f L (£K-) = 


Jin c 


exp 


(■?-'«/ °a /2 )~ 
2 


where £ is a dummy variable, representing a sample value of L a {rnj). Note also that £ is 
continuous whereas, of course, nij is discrete. It follows that in formulating the mutual 
information between the message bit mj = +1 and a priori L-value L a (mj ) we have a binary 
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input AWGN channel to deal with; such a channel was discussed previously in Example 5 
of Chapter 5 on information theory. Building on the results of that example, we may 
express the first desired mutual information, denoted by L a ), as follows: 


7 iK; L a) 


1 

2 


X J ' 0g 2 




l, +1 


fM «/ = - 1 ) +/l ( £1 = + 1 ) 


d£ 


where the summation accounts for the binary nature of the information bit rrij and the 
integral accounts for the continuous nature of L a . Using (10.131) and (10.132) and 
manipulating the results, we get (ten Brink, 2001): 


exp 


(^ L a)=l-f 


(£- m j<\ /2 Y 


log o [ 1 +exp(-|)] d£ 


71 (X, 


which, as expected, depends solely on the variance a~. To emphasize this fact, let the new 
function 

#(c 7 a ):= /!(m ; .;L a ) 

with the following two limiting values: 

lim $(cr) = 0 
cr — > 0 


and 


lim ^(cr a ) = 1 
°k-»°o 


In other words, we have 

0 </j(»i ; .;L a )< 1 

Moreover, ( cr ) increases monotonically with increasing tr , which means that if the 
value of the mutual information / 1 (m^.;L a ) is given, then the corresponding value of a a is 
uniquely determined by the inverse formula: 

= $~\h) 


and with it, the corresponding Gaussian random variable L a (mj) defined in (10.129) is 
obtained. 

Referring back to (10.128), we note that for us to construct the EXIT chart we also need 
to know the second mutual information between the message bit rrij and the a posteriori 
extrinsic /,- value L p (wy). To this end, we may build on the formula of (10.132) to write 


himpL v ) = \ X I f L^\ m P log : 


p J 


rrij = -1, +1 


AS 4 \ = - 1 ) +AS Z\ m j =+') 


d£ 


where, in a manner similar to the a priori mutual information I \( my L.Jnij)) , we also 
have 


0</ 2 (m 7 .;L p )< 1 
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Accordingly, with the two mutual informations I,(m.\L ) and 7 9 (m-;L ) at hand, we 

i J a. L J p 

may go on to compute the EXIT chart for an iterative decoding algorithm by merely 
focusing on a single constituent decoder in the turbo decoding algorithm. 

The next issue to be considered is how to perform this computation, which we now 
address. 


For turbo codes having long interleavers, the approximate Gaussian model of (10.129) is 
good enough for practical purposes. Hence, we may use this model to formulate the 
traditional histogram method , described in ten Brink (2001) to compute the EXIT chart. 
Specifically, (10.137) is used to compute the mutual information ^(m^L ) for a 
prescribed E^/Nq. To this end, Monte Carlo simulation (i.e., histogram measurements) is 
used to compute the required probability density function, f L ( c| L (mj)) , on which no 
Gaussian assumption can be imposed for obvious reasons. Computaton of this probability 
density function is central to the EXIT chart, which may proceed in a step-by-step manner 
for a prescribed E^/Nq, as follows: 

Apply the independent Gaussian random variable defined in (10.129) to constituent 

decoder 1 in the turbo decoder. The corresponding value of the mutual information /, (m-\L ) 

2 ^ J P 

is obtained by choosing the variance a & in accordance with (TO. 129). 

Using Monte Carlo simulation, compute the probability density function 
f L ( c\ L p ) . Hence, compute the second mutual information E(m-\ L ) , and with it a certain 
point for the extrinsic information transfer curve of constituent decoder 1 is determined. 

Continue Steps 1 and 2 until we have sufficient points to construct the extrinsic 
information transfer curve of constituent decoder 1 . 

Construct the extrinsic information transfer curve of constituent decoder 2 as the 
mirror image of the curve for constituent decoder 1 computed in Step 3, respecting the 
straight line l\ = 7 2 . 

Construct the EXIT chart for the turbo decoder by combining the extrinsic 
information transfer curves of constituent decoders 1 and 2. 

Starting with some prescribed initial condition, for example 7j(m j) = 0 for 
message bit ni \ , construct the staircase information transfer trajectory between constituent 
decoders 1 and 2. 

A desirable feature of the histogram method for computing the EXIT chart is the fact that, 
except for the approximate Gaussian model of (10.129), there are no other assumptions 
needed for the computations involved in Steps 1 through 6. 


For another method to compute EXIT charts, we may use the so-called averaging method, 
which represents an alternative approach to the histogram method. 

As a reminder, the basic issue in computing an EXIT chart is to measure the mutual 
information between the information bits, nij, at the turbo encoder input in the transmitter 
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and the corresponding /.-values produced at the output of the corresponding BCJR decoder 
in the receiver. Due to the inherently nonlinear input-output characteristic of the BCJR 
decoder, the underlying distribution of the L-values is not only unknown, but also highly 
likely to be non-Gaussian as well, thereby complicating the measurement. To get around this 
difficulty, we may invoke the ergodic theorem, which was discussed previously in Chapter 4 
on stochastic processes. As explained therein, under certain conditions, it is feasible to 
replace the operation of ensemble averaging (i.e., expectation) with time averaging. Thus, in 
proceeding along this ergodic path, we have a new nonlinear transformation, where the time 
average of a large set of /.-value samples, available at the output of the BCJR decoder, 
provides an estimate of the desired mutual information; moreover, it does so without 
requiring knowledge of the original data (i.e., the mj). It is for this reason that the second 
method of computing EXIT charts is called the averaging method. 

Just as the use of a single constituent decoder suffices for computing an EXIT chart in 
the histogram method, the same decoding scenario applies equally well to the averaging 
method. It is with this point in mind that the underlying scheme for the averaging method 
is as depicted in Figure 10.35. Most importantly, this scheme is designed in such a way 
that the following requirements are satisfied: 

Implementations of channel estimation, carrier receiver, modulation, and 
demodulation are all perfect. 

The turbo decoder is perfectly synchronized with the turbo encoder. 

The BCJR algorithm or exact equivalent (i.e., the log-MAP algorithm) is used to 
optimize the turbo decoder. 

Moreover, the following analytic correspondences between the constituent encoder 1 at 
the top of Figure 10.35 and the turbo decoder 2 at the bottom of the figure are carefully 
noted: the code vectors c ,(l \ c / 1 \ and termination vector t 1 1 1 in the encoder map onto the a 
posteriori L-values L p (r (0) ), L p (r (l ^), and L p (z (1) ) in the decoder, respectively. 

It can therefore be justifiably argued that in light of these rigorous requirements, the 
underlying algorithm for the averaging method is well designed and therefore trustworthy 
in the following sense: in the course of computing the EXIT chart, the algorithm trusts 
what the computed L-values actually say; that is, they do not under or over represent their 
confidence in the message bits. This important characteristic of the averaging method is to 
be contrasted against the histogram method. Indeed, it is for this reason that the histogram 
method compares the L-values against the true values of the message bits, hence requiring 
knowledge of them. 

In summary, we may say that trustworthy L-values are those L-values that satisfy the 
consistency condition. A simple way of testing this condition is to do the following: use 
the averaging and histogram methods to compute two respective sets of L-values. If, then, 
both methods yield the same value for the mutual information, then the consistency 
condition is satisfied (Maunder, 2012). 

Procedure for Measuring the EXIT Chart 

Referring back to the scheme of Figure 10.35, the demultiplexer outputs denoted by r (0 \ 
r (1) and z (1) represent the L-values corresponding to the encoder outputs c (0 \ c (1) and t (1 \ 
respectively. Thus, following the way in which the turbo decoder of Figure 10.33b was 
described, the internally generated input applied to BCJR decoder 1 assumes exactly the 
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Constituent decoder 2 

Schematic diagram for computing the EXIT chart for the UMTS-turbo code, based on 
the averaging method. 


same value as that produced in computing the BER. With the objective being that of 
constructing an EXIT chart, we need to provide values for the mutual information that was 
previously denoted by L a ), where mj is the /th message bit and L a is the corresponding 
a priori L-value. As indicated, the mutual information is the externally supplied input 
applied to the block labeled L-value generator. We may therefore assign to L a ) any 
values that we like. However, recognizing that 0 < / \(mf. L a ) < 1, a sensible choice of values 
for I\(mj\ L a ) would be the set {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. Such a 
choice provides inputs for eleven different experiments based on the averaging method. 

For each one of these inputs, the constituent decoder 1 produces a corresponding value 
for the a posteriori extrinsic L-value, L p , which is applied to the block labeled mutual- 
information computer in Figure 10.35. The resulting output of this second computation is 
the second desired mutual information, namely l^jn f, L e ). At this point, a question that 
begs itself is: how can this computation be performed in the absence of the message mp. 
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The answer to this question is that, as pointed out previously, the averaging method is 
designed to trust what the extrinsic L- values say; hence, the computation of IjOnf, L e ) does 
not require any knowledge of the message bit rrij. 

Computer-Oriented Experiments for EXIT Charts 

The EXIT charts plotted in Figures 10.33 and 10.34 were computed in Matlab for the 
UMTS codec using the averaging method discussed in Example 9 and 5000 message bits. 

In Figure 10.33, the computations were performed for the SNR: E b /N b = - 4 dB. In this 
case, the tunnel is open, indicating that the UMTS decoder is convergent (stable). 

In Figure 10.34, the computation was performed for a smaller SNR: E b /N 0 = - 6 dB. In 
this case, the tunnel is closed, indicating that the UMTS decoder is nonconvergent. 

These computer experiments confirm the practical importance of EXIT charts when the 
issue of interest is that of evaluating the dynamic behavior of a turbo decoder. 

Low-Density Parity-Check Codes 


Turbo codes discussed in Section 10.12 and FDPC codes to be discussed in this section 
belong to a broad family of error-control coding techniques, collectively called compound 
probabilistic codes. The two most important advantages of LDPC codes over turbo codes are: 

• absence of low-weight codewords and 

• iterative decoding of lower complexity. 

With regard to the issue of low-weight codewords, we usually find that a small number of 
codewords in a turbo codeword are undesirably close to the given codeword. Owing to this 
closeness in weights, once in a while the channel noise causes the transmitted codeword to 
be mistaken for a nearby code. Indeed, it is this behavior that is responsible for the error 
floor that was mentioned in Section 10.13. In contrast, FDPC codes can be easily 
constructed so that they do not have such low-weight codewords and they can, therefore, 
achieve vanishingly small BERs. (The error-floor problem in turbo codes can be alleviated 
by careful design of the interleaver.) 

Turning next to the issue of decoding complexity, we note that the computational 
complexity of a turbo decoder is dominated by the MAP algorithm, which operates on the 
trellis for representing the convolutional code used in the encoder. The number of 
computations in each recursion of the MAP algorithm scales linearly with the number of 
states in the trellis. Commonly used turbo codes employ trellises with 16 states or more. In 
contrast, LDPC codes use a simple parity-check trellis that has just two states. 
Consequently, the decoders for LDPC codes are significantly simpler to design than those 
for turbo decoders. However, a practical objection to the use of LDPC codes is that, for 
large block lengths, their encoding complexity is high compared with turbo codes. 

It can be argued that LDPC codes and turbo codes complement each other, giving the 
designer more flexibility in selecting the right code for extraordinary decoding performance. 


LDPC codes are specified by a parity-check matrix denoted by A, which is purposely 
chosen to be sparse', that is, the code consists mainly of 0s and a small number of Is. In 
particular, we speak of (n, t c , t { ) LDPC codes, where n denotes the block length, t c denotes 
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the weight (i.e., number of Is) in each column of the matrix A, and f r denotes the weight of 
each row with t T > t c . The rate of such an LDPC code is defined by 



whose validity may be justified as follows. Let p denote the density of Is in the parity- 
check matrix A. Then, following the terminology introduced in Section 10.4, we may set 


t c = p(n-k) 

and 


t r = pn 

where ( n - k) is the number of rows in A and n is the number of columns (i.e., the block 
length). Therefore, dividing t c by t r , we get 



By definition, the code rate of a block code is k!n\ hence, the result of (10.139) follows. 
For this result to hold, however, the rows of A must be linearly independent . 

The structure of LDPC codes is well portrayed by bipartite graphs, which were 
introduced by Tanner (1981) and, therefore, are known as Tanner graphs. Figure 10.36 
shows such a graph for the example code of n = 10, f c = 3, and f r = 5. The left-hand nodes in 
the graph are variable (symbol) nodes, which correspond to elements of the codeword. The 
right-hand nodes of the graph are check nodes, which correspond to the set of parity-check 
constraints satisfied by codewords in the code. LDPC codes of the type exemplified by the 
graph of Figure 10.36 are said to be regular, in that all the nodes of a similar kind have 
exactly the same degree. In Figure 10.36, the degree of the variable nodes is t c = 3 and the 
degree of the check nodes is t r = 5. As the block length n approaches infinity, each check node 
is connected to a vanishingly small fraction of variable nodes; hence the term “low density.” 



Bipartite graph of the 
(10, 3, 5) LDPC code. 


Variable 

nodes 
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The matrix A is constructed by putting Is in A at random, subject to the regularity 
constraints: 

• each column of matrix A contains a small fixed number t c of Is; 

• each row of the matrix contains a small fixed number t r of Is. 

In practice, these regularity constraints are often violated slightly in order to avoid having 
linearly dependent rows in the parity-check matrix A. 

Unlike the linear block codes discussed in Section 10.4, the parity-check matrix A of 
LDPC codes is not systematic (i.e., it does not have the parity-check bits appearing in 
diagonal form); hence the use of a symbol different from that used in Section 10.4. 
Nevertheless, for coding purposes, we may derive a generator matrix G for LDPC codes 
by means of Gaussian elimination performed in modulo-2 arithmetic; this procedure is 
illustrated later in Example 10. Following the terminology introduced in Section 10.4, the 
1-by-n code vector c is first partitioned as shown by 

c = [b | m] 

where m is the A-by-l message vector and b is the (n - A') -by- 1 parity-check vector; see 
(10.9). Correspondingly, the parity-check matrix A is partitioned as 



where A] is a square matrix of dimensions (n - k ) x (n - k) and A 2 is a rectangular matrix 
of dimensions k x (n - k): transposition symbolized by the superscript T is used in the 
partitioning of matrix A for convenience of presentation. Imposing a constraint on the 
LDPC code similar to that of (10.16) we may write 


or, equivalently, 



= 0 


bAj + mA, = 0 

Recall from (10.7) that the vectors m and b are related by 


b = mP 


where P is the coefficient matrix. Hence, substituting this relation into ( 1 0. 14 1 ), we readily 
find that, after ignoring the common factor m for any nonzero message vector, the 
coefficient matrix of LDPC codes satisfies the condition 


PAj + A 2 = 0 

This equation holds for all nonzero message vectors and, in particular, for m in the form 
[0 ... 0 1 0 ... 0] that will isolate individual rows of the generator matrix. 

Solving (10.142) for matrix P, we get 


P = A.Aj 1 
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where A j is the inverse of matrix A], which is naturally defined in modulo-2 arithmetic. 
Finally, building on (10.12), the generator matrix of LDPC codes is defined by 

G = [p j I, 


A 2 A i 


I, 


where \ is the k-by-k identity matrix. 

It is important to note that if we take the parity-check matrix A for some arbitrary 
LDPC code and just pick (n - k ) columns of A at random to form a square matrix Aj, there 
is no guarantee that Aj will be nonsingular (i.e., the inverse Aj 1 will exist), even if the 
rows of A are linearly independent. In fact, for a typical LDPC code with large block 
length n, such a randomly selected A| is highly unlikely to be nonsingular because it is 
very likely that at least one row of A | will be all Os. Of course, when the rows of A are 
linearly independent, there will be some set of (n - k ) columns of A that will result in a 
nonsingular A j, to be illustrated in Example 10. For some construction methods for LDPC 
codes, the first (n - k ) columns of A may be guaranteed to produce a nonsingular A j, or at 
least do so with a high probability, but that is not true in general. 

(10, 3, 5) LDPC Code 

Consider the Tanner graph of Figure 10.34 pertaining to a (10, 3, 5) LDPC code. The 
parity-check matrix of the code is defined by 
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which appears to be random, while maintaining the regularity constraints: t c = 3 and t r = 5. 
Partitioning the matrix A in the manner just described, we write 

10 10 10 
110100 
0 10 110 
100101 
0 110 10 
101100 


A l = 


A 2 “ 


0 10 10 1 
010011 
101001 
001011 
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To derive the inverse of matrix Aj, we first use (10.140) to write 


[ b Q , b { , b 2 , by by b 5 

b 


10 10 10 
110100 
0 10 110 
100101 
0 110 10 
1 0 1 1 0 0 _ 

A i 


[ ZZq, ZZj, ZZ9, Uy My u^ ] 

u = mA, 


where we have introduced the vector u to denote the matrix product mA 2 . By using 
Gaussian elimination , modulo-2 , the matrix Aj is transformed into lower diagonal form 
(i.e., all the elements above the main diagonal are zero), as shown by 


A l 


100000 
1 1 0 0 0 0 
011000 
101100 
0 10 110 
100101 


This transformation is achieved by the following modulo-2 additions performed on the 
columns of square matrix Aj: 

• columns 1 and 2 are added to column 3; 

• column 2 is added to column 4; 

• columns 1 and 4 are added to column 5; 

• columns 1, 2, and 5 are added to column 6. 

Correspondingly, the vector u is transformed as shown by 

U — > [z/g, U J, Mq + Mj + I<9, U j + Uy Uq + U 2 + Uy Mq + Z<J + lt 4 + ZZ 5 ] 

Accordingly, premultiplying the transformed matrix Aj by the parity vector b, using 
successive eliminations in modulo-2 arithmetic working backwards and putting the 
solutions for the elements of the parity vector b in terms of the elements of the vector u in 
matrix form, we get 


[ Mq) U Uy tt Uy ZZ^ 


u 


001011 
101001 
111000 
110010 
010011 
11110 1 



[ b 0 , b v by by by b 5 ] 

b 
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The inverse of matrix A t is therefore 



001011 
101001 
111000 
110010 
010011 
0 1110 1 


Using the given value of A 2 and the value of Aj 1 just found, the matrix product A 9 Aj 1 
is given by 


a 2 a 


-l 

t 


100110 
000111 
001110 
0 10 110 


Finally, using (10.144), the generator of the (10, 3, 5) LDPC code is defined by 



100110 

10 0 0 


000111 

0 10 0 

G = 

001110 

0 0 10 


0 10 110 

0 0 0 1 


A A 1 T 

a 2 a 1 l k 


It is important to recognize that the LDPC code described in this example is intended only 
for the purpose of illustrating the procedure involved in the generation of such a code. In 
practice, the block length n is orders of magnitude larger than that considered in this 
example. Moreover, in constructing the matrix A, we may constrain all pairs of columns to 
have a matrix overlap (i.e., inner product of any two columns in matrix A) not to exceed 
one; such a constraint, over and above the regularity constraints, is expected to improve 
the performance of LDPC codes. Unfortunately, with a small block length as that 
considered in this example, it is difficult to satisfy this additional requirement. 


In practice, the block length of an LDPC code is large, ranging from 10 3 to 10 6 , which 
means that the number of codewords in a particular code is correspondingly large. 
Consequently, the algebraic analysis of LDPC codes is rather difficult. As such, it is much 
more productive to perform a statistical analysis on an ensemble of LDPC codes. Such an 
analysis permits us to make statistical statements about certain properties of member codes 
in the ensemble. An LDPC code with these properties can be found with high probability by 
a random selection from the ensemble, hence the inherent probabilistic structure of the code. 
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Among these properties, the minimum distance of the member codes is of particular 
interest. From Section 10.4, we recall that the minimum distance of a linear block code is 
the smallest Hamming distance between any pair of code vectors in the code. In contrast, 
we say: 


Elsewhere, it is shown that as the block length n increases for fixed t c > 3 and t T > t c , the 
probability distribution of the minimum distance can be overbounded by a function that 
approaches a unit step function at a fixed fraction A, t of the block length n. Thus, for 
large n, practically all the LDPC codes in the ensemble have a minimum distance of at 
least n A, t . Table 10.7 presents the rate and A, , of LDPC codes for different values of 

c r c r 

the weight-pair (f c , r r ). From this table we see that for t c = 3 and t r = 6, the code rate r 
attains its highest value of 1/2 and the fraction A r t attains its smallest value; hence the 
preferred choice of f c = 3 and f r = 6 in constructing the LDPC code. 


At the transmitter, a message vector m is encoded into a code vector c = mG, where G is 
the generator matrix for a specified weight-pair (f c , f r ) and, therefore, minimum distance 
r/ min . The vector c is transmitted over a noisy channel to produce the received vector 

r = c + e 

where e is the error vector due to channel noise; see (10.17). By construction, the matrix A 
is the parity-check matrix of the LDPC code; that is, AG T = 0. Given the received vector r, 
the bit-by-bit decoding problem is to find the most probable vector c that satisfies the 
condition cA T = 0 in accordance with the constraint imposed on matrix A in (10.140). 

In what follows, a bit refers to an element of the received vector r and a check refers to 
a row of matrix A. Let $(i) denote the set of bits that participate in check i. Let ;j{j) denote 
the set of checks in which bit j participates. A set of ${i) that excludes bit j is denoted by 
${i)\j. Likewise, a set of $( j) that excludes check i is denoted by ${j)\i. 


The rate and fractional term of LDPC codes for 
varying weight-pairs* 
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The decoding algorithm has two alternating steps: a horizontal step and a vertical step , 
which run along the rows and columns of matrix A, respectively. In the course of decoding, 
two probabilistic quantities associated with nonzero elements of matrix A are alternately 
updated. One quantity, denoted by /V . , defines the probability that bit j is symbol x (i.e., 
symbol 0 or 1), given the information derived via checks performed in the horizontal step 
except for check i. The second quantity, denoted by , defines the probability that check i 
is satisfied given that bit j is fixed at the value x and the other bits have the probabilities 
Pjj where we have j' e ${i)\j ■ 

The LDPC decoding algorithm then proceeds as follows. 


Initialization 

The variables p\P and p\P are set equal to the a priori probabilities p\ {>> and p\^ of 

y y mi (i) J 1 

symbols 0 and 1, respectively, with p- + pj = 1 for all j. 

Horizontal Step 

In the horizontal step of the algorithm, we run through the checks i. To this end, define 


AP.. = p^-pW 

y y y 


For each weight-pair (i,j), compute 


Hence, set 


A Qu 


e (0) 


Q 


(i) 


n a, v 

J' ■ KOV 


- 2 0*sQ,i) 

5<i - 4 2 v ) 


Vertical Step 

In the vertical step of the algorithm, values of the probabilities P^ * and P^j * are updated 
using the quantities computed in the horizontal step. In particular, for each bit j, compute 

^ = <v, (0) n 

i' e sm 

= n A0 ' 1} 


j 


i' e &0\j 


where the scaling factor a (/ - is chosen so as to satisfy the condition 

P f + Pf=\ for all ij 

In the vertical step, we may also update the pseudo-posterior probabilities : 


'f = - jp ? n ^ 

i e $U) 

n q\P 

i e $U) 
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where a ; is chosen so as to make 


P {0) + P a) 


= 1 


for all j 


J J 

The quantities obtained in the vertical step are used to compute a tentative estimate c. If 
the condition cA T = 0 is satisfied, the decoding algorithm is terminated. Otherwise, the 
algorithm goes back to the horizontal step. If after some maximum number of iterations 
(e.g., 100 or 200) there is no valid decoding, then a decoding failure is declared. The 
decoding procedure described herein is a special case of the general low-complexity sum- 
product algorithm. 

Simply stated, the sum-product algorithm passes probabilistic quantities between the 
check nodes and variable nodes of the Tanner graph. On account of the fact that each 
parity-check constraint can be represented by a simple convolutional coder with one bit of 
memory, we find that LDPC decoders are simpler to implement than turbo decoders, as 
stated earlier on in the section. 

In terms of performance, however, we may say the following in light of experimental 
results reported in the literature: 


Thus far in this section, we have focused attention on regular LDPC codes, which 
distinguish themselves in the following way: referring to the Tanner (bipartite) graph in 
Figure 10.36, all variable nodes on the left-hand side of the graph have the same degree 
and likewise for the check nodes on the right-hand side of the graph. 

To go beyond the performance attainable with regular LDPC codes and thereby come 
increasingly closer to the Shannon limit, we look to irregular LDPC codes, in the context 
of which we introduce the following definition: 


To be specific, an irregular LDPC code distinguishes itself from its regular counterpart in 
that its Tanner graph involves the following two degree distributions: 

The degree distribution of the variable nodes in the Tanner graph of an irregular 
LDPC code is described by: 

d N . . 

A(X ) = £ A d X 
d= l 

where X denotes a node variable in the code’s Tanner graph, A d denotes the fraction 
of variable nodes with degree d in the graph, and d N denotes the maximum degree of 
a variable node in the graph. 

Correspondingly, the degree distribution of the check nodes in the irregular code’s 
Tanner graph is described by 

d c 

p(x)= X/lA 1 

rf= 1 
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where X denotes the check node in the code’s Tanner graph, p d denotes the fraction 
of check nodes with degree d in the graph, and d c denotes the maximum degree of a 
check node in the graph. 

The irregular LDPC code embodies the regular LDPC code as a special case. Specifically, 
(10.145) and (10.146) simplify as follows for the variable and check nodes of a regular 
LDPC code, respectively: 

OJ\i — 1 

A(X) = X for X d = 1 and d N = a> N 

and 

p(X) = X for p d = 1 and d c = a> c 

By exploiting the two degree distributions of (10.145) and (10.146) for the variable and 
check nodes, respectively, irregular LDPC codes are commonly constructed on the basis of 
their Tanner graphs. Such an approach is exemplified by the irregular LDPC codes 
reported in Richardson et al. (2001), and Richardson and Urbanke (2001). 

Trellis-Coded Modulation 


In the different approaches to channel coding described up to this point in the chapter, the 
one common feature that describes them all may be summarized as follows: 


Moreover, error control is provided by transmitting additional redundant bits in the code, 
which has the effect of lowering the information bit rate per channel bandwidth. That is, 
bandwidth efficiency is traded for increased power efficiency. 

To attain a more effective utilization of available resources, namely bandwidth and 
power, coding and modulation would have to be treated as a combined (single) entity. We 
may deal with this new paradigm by invoking the statement 


Indeed, this definition includes the traditional idea of parity-check coding. 

Trellis codes for band-limited channels result from the treatment of modulation and 
coding as a combined entity rather than as two separate operations. The combination itself 
is referred to as trellis-coded modulation (TCM). This form of signaling has three basic 
requirements: 

The number of signal points in the constellation used is larger than what is required for 
the modulation format of interest with the same data rate; the additional signal points 
allow redundancy for forward error-control coding without sacrificing bandwidth. 
Convolutional coding is used to introduce a certain dependency between successive 
signal points, such that only certain patterns or sequences of signal points are 
permitted for transmission. 

Soft-decision decoding is performed in the receiver, in which the permissible 
sequence of signals is modeled as a trellis structure; hence the name trellis codes. 
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Requirement 3 is the result of using an enlarged signal constellation. By increasing the size 
of the constellation, the probability of symbol error increases for a fixed SNR. Hence, with 
hard-decision demodulation, we would face a performance loss before we begin. Performing 
soft-decision decoding on the combined code and modulation trellis ameliorates this problem. 

In an AWGN channel, we look to the following approach: 


Thus, in the design of trellis codes, the emphasis is on maximizing the Euclidean distance 
between code vectors (equivalently codewords) rather than maximizing the Hamming 
distance of an error-correcting code. The reason for this approach is that, except for 
conventional coding with binary PSK and QPSK, maximizing the Hamming distance is 
not the same as maximizing the squared Euclidean distance. Accordingly, in what follows, 
the Euclidean distance between code vectors is adopted as the distance measure of 
interest. Moreover, while a more general treatment is possible, the discussion is (by 
choice) confined to the case of two-dimensional constellations of signal points. The 
implication of such a choice is to restrict the development of trellis codes to multilevel 
amplitude and/or phase modulation schemes such as M- ary PSK and M-ary QAM. 

Two-level Partitioning of 8-PSK Constellation 

The approach used to design this restricted type of trellis codes involves partitioning an 
M-ary constellation of interest successively into 2, 4, 8, ... subsets with size M/2, M/4, 
M/8, ..., and having progressively larger increasing minimum Euclidean distance 
between their respective signal points. Such a design approach by set-partitioning 
represents the key idea in the construction of efficient coded modulation techniques for 
band-limited channels. 
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In Figure 10.37, we illustrate the partitioning procedure by considering a circular 
constellation that corresponds to 8-PSK. The figure depicts the constellation itself and the 
two and four subsets resulting from two levels of partitioning. These subsets share the 
common property that the minimum Euclidean distances between their individual points 
follow an increasing pattern, namely: 

do < d\ < d2 


Three-level Partitioning of QAM Constellation 

For a different two-dimensional example, Figure 10.38 illustrates the partitioning of a 
rectangular constellation corresponding to 16-QAM. Here again, we see that the subsets 
have increasing within-subset Euclidean distances, as shown by 

do < d\ < r/2 < dy 
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Partitioning of 16-QAM constellation, which shows that d$<d{ < < d-^. 


Based on the subsets resulting from successive partitioning of a two-dimensional 
constellation, illustrated in Examples 1 1 and 12, we may devise relatively simple, yet highly 
effective coding schemes. Specifically, to send n bits/symbol with quadrature modulation 
(i.e., one that has in-phase and quadrature components), we start with a two-dimensional 
constellation of 2" +1 signal points appropriate for the modulation format of interest; a 
circular grid is used for M - ary PSK and a rectangular one for M - ary QAM. In any event, the 
constellation is partitioned into four or eight subsets. One or two incoming message bits per 
symbol enter a rate- 1/2 or rate-2/3 binary convolutional encoder, respectively; the resulting 
two or three coded bits per symbol determine the selection of a particular subset. The 
remaining uncoded messege bits determine which particular signal point from the selected 
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subset is to be signaled over the AWGN channel. This class of trellis codes is known as 
Ungerboeck codes in recognition of their originator. 

Since the modulator has memory, we may use the Viterbi algorithm (discussed in 
Section 10.8) to perform maximum likelihood sequence estimation at the receiver. Each 
branch in the trellis of the Ungerboeck code corresponds to a subset rather than an 
individual signal point. The first step in the detection is to determine the signal point 
within each subset that is closest to the received signal point in the Euclidean sense. The 
signal point so determined and its metric (i.e., the squared Euclidean distance between it 
and the received point) may be used thereafter for the branch in question, and the Viterbi 
algorithm may then proceed in the usual manner. 


The scheme of Figure 10.39a depicts the simplest Ungerboeck 8-PSK code for the 
transmission of 2 bits/symbol. The scheme uses a rate- 1/2 convolutional encoder; the 
corresponding trellis of the code is shown in Figure 10.39b, which has four states. Note 
that the most significant bit of the incoming message sequence is left uncoded. Therefore, 
each branch of the trellis may correspond to two different output values of the 8-PSK 
modulator or, equivalently, to one of the four two-point subsets shown in Figure 10.37. 
The trellis of Figure 10.39b also includes the minimum distance path. 

The scheme of Figure 10.40a depicts another Ungerboeck 8-PSK code for transmitting 
2 bits/sample; it is next in the level of increased complexity, compared to the scheme of 
Figure 10.39a. This second scheme uses a rate-2/3 convolutional encoder. Therefore, the 
corresponding trellis of the code has eight states, as shown in Figure 10.40b. In this latter 
scheme, both bits of the incoming message sequence are encoded. Hence, each branch of 
the trellis corresponds to a specific output value of the 8-PSK modulator. The trellis of 
Figure 10.40b also includes the minimum distance path. 

Figures 10.39b and 10.40b also include the pertinent encoder states. In Figure 10.39a, 
the state of the encoder is defined by the contents of the two-stage shift register. On the 
other hand, in Figure 10.40a, it is defined by the content of the single-stage (top) shift 
register followed by that of the two-stage (bottom) shift register. 


Following the discussion in Section 10.8 on maximum likelihood decoding of 
convolutional codes, we define the asymptotic coding gain of Ungerboeck codes as 
follows: 


where df Tee is the free Euclidean distance of the code and c/ ref is the minimum Euclidean 
distance of an uncoded modulation scheme operating with the same signal energy per bit. 
For example, by using the Ungerboeck 8-PSK code of Figure 10.39a, the signal 
constellation has eight message points and we send two message bits per signal point. 
Hence, uncoded transmission requires a signal constellation with four message points. We 
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Rate-1/2 convolutional encoder 

(a) 



(a) Four-state Ungerboeck code for 8-PSK; the mapper follows Figure 10.37. 
(b) Trellis of the code. 


may therefore regard uncoded 4-PSK as the frame of reference for the Ungerboeck 8-PSK 
code of Figure 10.39a. 

The Ungerboeck 8-PSK code of Figure 10.39a achieves an asymptotic coding gain of 3 
dB, which is calculated as follows: 

Each branch of the trellis in Figure 10.39b corresponds to a subset of two antipodal 
signal points. Hence, the free Euclidean distance d tvcc of the code can be no larger 
than the Euclidean distance c/ 2 between the antipodal signal points of such a subset. 
We may therefore write 

^free = d 2 = ~ 

where the distance c/ 2 is defined in Figure 10.41a. 
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Rate-2/3 convolutional encoder 



(a) 


Encoder 

state 

000 


010 


100 


110 


001 


Oil 


101 


111 


000 000 000 



(b) 


(a) Eight-state Ungerboeck code for 8-PSK; the mapper follows Figure 10.37. 
(b) Trellis of the code with only some of the branches shown. 
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Quadrature Quadrature 




Signal-space diagrams for calculation of asymptotic coding gain 
of Ungerboeck 8-PSK code: (a) definition of distance r/ 9 ; (b) definition of 
reference distance d re {. 


From Figure 10.41b, we see that the minimum Euclidean distance of an uncoded 
QPSK, viewed as the frame of reference operating with the same signal energy per 
bit, assumes the following value: 

^ref = ^ 

Hence, as previously stated, the use of ( 10.149) yields an asymptotic coding gain of 

10 log 10 2 = 3 dB. 

The asymptotic coding gain achievable with Ungerboeck codes increases with the number 
of states in the convolutional encoder. Table 10.8 presents the asymptotic coding gain (in 
dB) for Ungerboeck 8-PSK codes for increasing number of states, expressed with respect 
to uncoded 4-PSK. Note that improvements on the order of 6 dB require codes with a very 
large number of states. 

Asymptotic coding gain of Ungerboeck 8-PSK codes, with 
respect to uncoded 4-PSK 


4 

8 

16 

32 

64 

128 

256 

512 

3 

3.6 

4.1 

4.6 

4.8 

5 
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Turbo Decoding of Serial Concatenated Codes 


In Section 10.12 we pointed out that there are two types of concatenated codes: parallel 
and serial. The original turbo coding scheme involved a parallel concatenated code, since 
the two encoders operate in parallel on the same set of message bits. We now turn our 
attention in this section to a serial concatenation scheme as depicted in Figure 10.42, 
comprised of an “outer” encoder whose output feeds an “inner” encoder. Whereas the 
serial concatenation idea can be traced to as early as Shannon’s seminal work, the 
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Inner encoder 



Serial concatenated codes; as usual, n denotes an interleaver. 


connection with turbo coding occurred only after the parallel concatenated scheme of 
Berrou et al. (see Section 10.12) gained widespread acclaim. The iterative decoding 
algorithm for the serial concatenated scheme was first analyzed in detail by Benedetto and 
coworkers (Benedetto and Montorsi, 1996; Benedetto et al., 1998); the algorithm follows 
a similar logic to the parallel concatenated scheme, in the form of information exchange 
between the two decoders as in Figure 10.43. This iterative information exchange is 
observed to significantly improve the overall error-correction abilities of the decoder, just 
as in the conventional turbo decoder. We shall review the basics of the iterative decoding 
algorithm in what follows in order to emphasize the common points with the iterative 
algorithm described in Section 10.12. 

The particular interest in the serial concatenated scheme, however, becomes apparent 
once we recognize that the inner encoder-decoder pair need not be a conventional error- 
correction code, but in fact may assume more general forms that are often encountered in 
communication systems. A few examples may be highlighted as follows: 

The inner encoder may in fact be a TCM stage, as studied in Section 10.15. The 
iterative decoding algorithm connecting the trellis-coded demodulator with the outer 
error-correction code leads to turbo TCM. 

The inner encoder may be the communication channel itself, which is of interest 
when the channel induces 1ST The output symbols of the channel may then be 
expressed as a convolution between the input symbol sequence and the channel 
impulse response, and the decoder operation corresponds to channel equalization 
(Chang and Hancock, 1966). Combining the equalizer with the outer channel 
decoder gives rise to turbo equalization. 


P(r | c) 



Key: 

L = Likelihood function 
A = A priori probabilities 
E = Extrinsic probabilities 


Iterative decoder structure. 
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In multi-user communication systems, the inner encoder may represent a single 
user’s access point to a shared channel through, say, direct-sequence CDMA, in 
which users sharing a channel are distinguished by an assigned repetition code. The 
inner decoder is a multiuser detector, which aims to separate the multiple users into 
distinct symbol streams; when combined with the outer decoder using information 
exchange, a turbo CDMA system results. 

The above list is by no means exhaustive, but merely represents some of the more 
commonly studied variants of iterative receiver design. We will focus here on the basic 
iterative decoding scheme using an error-correction code for the inner encoder-decoder 
pair, and then briefly illustrate its applications to turbo equalization. 


Consider first the case in which both encoders in the serial concatenation of Figure 10.42 
implement forward error-correction coding. For efficiency reasons, we assume that the 
outer encoder implements a systematic code, so that the codeword c it produces appears as 
follows: 

c = [ b | m ] 

in which m contains the k message bits and b contains the n - k parity-check bits. By 
choosing a recursive systematic encoder, the corresponding decoding operation can 
exploit the BCJR algorithm discussed in Section 10.9. 

The second, or “inner,” encoder is also based on a trellis code (although not necessarily 
systematic) so that it, too, will admit an efficient decoder using the MAP decoding 
algorithm. As illustrated in Figure 10.42, the inner encoder also integrates an interleaver, 
denoted by n , which permutes the order of the bits in the code vector c prior to the second 
encoding operation. Without this interleaver, the serial concatenation of two trellis codes 
would merely give a larger-dimension trellis code having limited error-correction 
capabilities. The inclusion of the interleaver alters markedly the minimum distance 
properties of the code, and constitutes an essential ingredient in obtaining a good error- 
correction code. 


The output from the inner encoder is sent across the channel, which may be a binary 
symmetric channel or an AWGN channel, to produce the received vector r. The simplest 
way to decode the received signal is to cascade the corresponding inner and outer 
decoders. A refined approach is to allow information exchange between the two decoders, 
to trigger the turbo effect; this idea is illustrated in Figure 10.43, and the manner of 
information exchange is developed in what follows. 

To begin, the inner decoder aims to obtain the bitwise a posteriori probability ratios 

P(c ; . = +l|r) 


As P(c ( |r) is a marginal probability calculated from the conditional probability P(c|r), the 
bit-wise a posteriori probability ratio may be developed into the new form 
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P(c- = +l|r) 
P(c; = — 1 1 r) 


In the second line of (10.152), we used Bayes' rule 

P(c | r) = P(r|c)P(c)/P(r) 

to expose the likelihood function P(r|c) as well as the a priori probability function P(c); 
the term P(r) is common to the numerator and denominator, and so cancels in the ratio. 

We now make the assumption that the a priori probability P(c) factors into the product 
of its marginals; that is, 

P(c) = P(cj)P(c 2 )...P(c n ) 

Strictly speaking, this is incorrect, since c contains both the message bits m and parity- 
check bits b from the outer encoder, and we know that the message bits determine the 
parity-check bits once the outer encoder is specified. The reason for invoking this 
assumption is to facilitate decoding via the BCJR algorithm. In particular, inserting this 
factorization of the a priori probability function into the a posteriori probability ratio, we 
may continue our development as shown by 


^ P(c|r) 

c: <7 = + 1 

£ P(c|r) 

C'.Cj = -1 

^ P(r|c)P(c) 


P(r|c)P(c) 

c:Cj = -l 


P(c f = +1 |r) 
P( C/ = — 1 1 r) 


^ p ( r |c)]^[ p (c 7 -) 

c:c ; = +l 7 = 1 

^ p ( r |c)]^[ 

c:q = -l 7=1 


P(C- = +1) 
P(C; = -1) 

prior ratio 


^ P ( r |c) Y[ P( c ;') 

c:c, = +1 j = 1 

j_*i_ 

P(r|c) Yl P(c.) 

e:c ; = -1 7=1 

j*i 


i = 1, 2, ..., n 


extrinsic information ratio 
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We obtained the second line in (10.154) by noting that each term in the numerator 
contains the factor P(q = +1) and, similarly, each term in the denominator contains the 
factor P(q = -1), hence the reason for the prior ratio factoring out of the expression. The 
remaining term is the extrinsic information ratio for bit q from the inner decoder. 

To facilitate passing information to the outer decoder, we may interpret each extrinsic 
information ratio from the inner decoder as the probability ratio of an auxiliary probability 
mass function 7( c) that fulfills two properties: 

• The probability mass function T( c) factors into its bitwise marginal functions 
according to 

T(c) = T ] (c ] )T 2 (c 2 ) ... T n (c„) 

• The bitwise marginal evaluations, each, sum to one, 7}(+l) + 7,4-1 ) = 1, and are 
chosen such that their ratios match the extrinsic information ratios from the inner 
decoder: 


y p ( r i c ) n p ( c j) 

c :c i = +1 j = 1 

7\(c,. = +l)_ J *' 

7,(q = -1) 

Y p ( r i c ) n p ( c p 

c ■C i = -1 j = 1 

j*i 


i = 1, 2, . . . , n 


Now, we note that by natural taking logarithms, the log extrinsic ratio becomes 

ln[7 ; .( + f)/7.(-l)] = L p ( C/ )-L a (c.) 

where Lp(q) is the log posterior ratio and T a (q) is the log prior ratio. 

Next, we note that the outer decoder does not have the usual channel likelihood 
evaluations available, but must instead take information from the inner decoder. While 
many possibilities in this direction may be envisaged, a successful iterative decoding 
algorithm results by replacing the a posteriori probability according to 

P(c|r) <- </>(c)T(c ) 

in which <j>( c) is the indicator function for the outer code, that is 

f 1, if c is a code vector 

tfc) = 

[ 0, otherwise 

We may think of the function (z!(c) as replacing the conventional channel likelihood 
function P(r|c), since it vanishes whenever c is not a code vector, and 7(c) = 7|(c| ) ... 
7„(c„) as replacing the a priori probability on each bit, since it factors into the product of 
its marginals. The conventional posterior probability ratio for the outer decoder is thus 
replaced with 
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P(c- = +l|r) 
P(c ; = — 1 1 r) 


z «nw 

c :c,- = +1 j = 1 

z 

c:c ; = -1 j = 1 


r,-(c/ = +D 

r, •(<■,■ = -i) 

prior ratio 


z wnw 


c:c ; = +1 7=1 

j*i 


z 


c:c, = -1 7 = 1 

j*i 


i = 1, 2, n 


extrinsic information ratio 


In (10.160) the second line is obtained upon noting that each term in the numerator 
(denominator) contains a factor T t (Cj = +1) (7,(c ( - = -1)). This separates the "pseudo-prior” 
ratio from the outer decoder’s extrinsic information ratio. 

Now, to couple the information from the outer decoder back to the inner decoder, we 
map the outer decoder’s extrinsic information values to a probability mass function 
U(c) which, akin to T( c) introduced above, fulfills two properties: 

The probability mass function U{ c) factors into the product of its bitwise marginal 
functions according to 

U(c) = U ] (c ] ) U 2 (c 2 ) ... U n (c n ) 

The bitwise marginal evaluations each sum to one, Uf (+1) + U t (-1) = 1, and are chosen 
such that their ratios match the extrinsic information ratios for the outer decoder: 


x ««»nw 

c:c ; = +1 7=1 

U i (c i = + 1) _ 

UXc. = -1) 

x ^ n w 

c:c t = -1 7=1 

j*> 


i = 1, 2, ..., n 


The marginal probability functions Uj(Cj) then replace the a priori probability values P(q) 
in the inner decoder, and the procedure iterates, thus defining the turbo decoder. In this 
fashion, we say the following: 
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In high-speed communication systems, further signal degradation can come from 
multipath artifacts in wireless environments, or reflection artifacts due to impedance 
mismatches in wireline systems. When such degradations have a temporal duration 
commensurate with the symbol period, the signal impinging on the receiver at a given 
sample instant is the composite influence of successive transmitted symbols, giving rise to 
ISI. Its severity worsens as the symbol period diminishes relative to the “delay spread” of 
the system, meaning that higher rate data systems must contend with ISI as a significant 
channel distortion mechanism. 

In such cases, the received symbol r i at sample instant i is a weighted combination of a 
set of successive transmitted symbols, according to 


Here, is the additive background noise, { q } is the transmitted symbol sequence 
obtained by interleaving the bit sequence {q} (and possibly followed by symbol mapping 
if using TCM), and {h 0 ,h l , .... h L _ | } is the channel impulse response of length L. 

If we consider a simple case in which each q is antipodal (q = ±1) and k = 0. 1, 2, we 
see that the noise-free channel outputs can be obtained through the trellis graph of Figure 
10.44: The transitions are determined by whether the input symbol is q = +1 or q = — 1, 
while the noise-free outputs are drawn from a finite set comprised of sums and differences 
of the channel impulse response coefficients. Thus, a convolutional channel which induces 
ISI may itself be viewed as a trellis code, and the BCJR algorithm may be applied directly 
to estimate the a posteriori probabilities of the transmitted symbols, and thus of the 
codeword bits { q } . The new result is that we have the traditional MAP equalizer. 

A turbo equalizer results upon noting that the convolutional channel and its MAP 
equalizer may be viewed as the inner encoder-decoder pair of a serial cascade scheme 
albeit one dictated by the communication channel and thus beyond the designer’s control. 
The outer encoder may again be chosen as a recursive systematic trellis code, whose 
decoder is coupled with the MAP equalizer in precisely the same manner: the extrinsic 
probabilities from one decoder are used in place of the a priori probabilities of the other, 
resulting in an iterative decoding and equalization scheme. 


L- 1 



k = o 





X 0 = h 0 + q + h 2 
'K 1 = h 0 + h 1 - h 2 
'M. 2 = h 0 -h l + h 2 

^3 = ^0 _ ~ *2 

3 = -h 0 + h 1 + h 2 
■X 5 = -h 0 + h l -h 2 
6 = -*o ~hi + h 2 
= -h 0 -h 1 -h 2 


Time i ► 

Trellis graph for a three-tap channel model, with transition branches 


listing the noise-free channel outputs. Solid transitions occur when the channel 
input is S( = +1; dashed transitions occur when q = -1. 
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Summary and Discussion 


In this rather long chapter we studied error-control coding techniques that have established 
themselves as indispensable tools for reliable digital communication over noisy channels. 
The effect of errors occurring during transmission is reduced by adding redundancy to the 
data prior to transmission in a controlled manner. The redundancy is used to enable a 
decoder in the receiver to detect and correct errors. 

Regardless of how they are designed, error-control coding schemes rely on Shannon’s 
1948 landmark paper, particularly the celebrated coding theorem, which asserts the 
following statement: 


The coding theorem was discussed in Chapter 5 on information theory. Restating it here, 
for the last time in the final chapter of the book, is intended to emphasize the importance 
of the theorem, which will last forever. 

In a historical context, error-control coding schemes may be divided into two broadly 
defined families: 

Legacy Codes 

As the name would imply, the family of legacy codes embodies several kinds of 
linear codes that originated in 1950 and, in the course of three decades or so, 
broadened its scope in depth as well as breadth. A distinctive feature of legacy codes 
is that of exploiting abstract algebraic structures built into their design in different 
ways and increasing mathematical abstraction. 

Specifically, legacy codes cover the following four schemes: 

Linear block codes, the first kind of which were described independently by Golay 
in 1949 and Hamming in 1950. Hamming codes are simple to construct and just as 
easy to decode using a look-up table based on the notion of syndrome. It is because 
of their computational simplicity and the ability to operate at high data rates, that 
we find that Hamming codes are widely used in digital communications. 

Cyclic codes, which form an importance subclass of linear block codes. Indeed, 
many of the block codes used in practice are cyclic codes for two compelling 
reasons: 

• The use of linear feedback shift registers for encoding and syndrome 
computation. 

• The inherent algebraic structure used to develop various practical decoding 
algorithms. 

Examples of cyclic codes include Hamming codes for digital communications, 
and most importantly, Reed-Solomon codes for combatting both random and 
burst errors encountered in difficult environments such as deep-space 
communications and compact discs. 

Convolutional codes, which distinguish themselves from linear block codes in 
the use of memory in the form of a finite-state shift register for implementing the 
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encoder. For decoding convolutional codes, the Viterbi algorithm (based on 
maximum likelihood decoding) is commonly used; this algorithm is designed to 
minimize the symbol-error rate on a symbol-by-symbol basis. 

Trellis coded modulation, which distinguishes itself from linear convolutional 
codes in the combined use of encoding and modulation in a single entity. The 
next result of so doing is the achievement of significant coding gains over 
conventional uncoded multilevel modulation schemes without having to sacrifice 
bandwidth efficiency in decoding. 

Probabilistic Compound Codes 

This second family of error-control coding schemes is exemplified by turbo codes 
and LDPC codes, which, as different as they are from each other, share a common 
property: 


To be more specific, in their own individual ways, they are both revolutionary: 


Moreover, in some specialized cases, very long rate- 1/2 irregular LPDC codes have 
approached the Shannon limit to within 0.0045 dB for AWGN channels, which is 
truly remarkable (Chung et al., 2001). 

These impressive coding gains have been exploited to dramatically extend the range of 
digital communication receivers, substantially increase the bit rates of digital 
communication systems, or significantly decrease the transmitted signal energy per 
symbol. The benefits have significant implications for the design of wireless 
communications and deep-space communications, just to mention two important 
applications of digital communications. Indeed, turbo codes have already been 
standardized for use on both of these applications. 

One last comment is in order: Turbo codes have not only impacted digital 
communications in the different ways just described, but the turbo decoding paradigm has 
also impacted applications outside the traditional scope of error-control coding. One such 
example is that of turbo equalization, briefly described in Section 10.16. Indeed, we may 
justifiably say the following as the last statement of the chapter: 


Problems 

Soft-Decision Coding 

Consider a binary input 2-ary output discrete memoryless channel. The channel is said to be 
symmetric if the channel transition probability p(j\i) satisfies the condition 

P(/|0) =p(Q- 1 7 = 0, 1, 
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Suppose that the channel input bits 0 and 1 are equally likely. Show that the channel output symbols 
are also equally likely; that is, 

p(j) = L 7 = 0 , i, ...,e-i 

Consider the quantized demodulator for binary PSK signals shown in Figure 10.3a. The quantizer is 
a four-level quantizer, normalized as in Figure P10.2. Evaluate the transition probabilities of the 
binary input-quarternary output discrete memoryless channel so characterized. Hence, show that it 
is a symmetric channel. Assume that the transmitted signal energy per bit is £ b and the AWGN has 
zero mean and power spectral density Nq!2. 


Quantizer 

output 



+3 


+1 


0 a 

T 


Quantizer 

input 


Consider a binary input AWGN channel, in which the bits 1 and 0 are equally likely. The bits are 
transmitted over the channel by means of phase-shift keying. The code symbol energy is E and the 
AWGN has zero mean and power spectral density Aq/ 2. Show that the channel transition probability 
is given by 


p(v|0) = —exp 

Jin 





—oo < y < oo 


Linear Block Codes 

Hamming codes are said to be perfect single-error correcting codes. Justify the fact that Hamming 
codes are perfect. 

Consider the following statement: 


Explain the conditions under which this statement is justified. 

In a repetition code , a single message bit is encoded into a block of identical bits to produce an (n, 
1 ). Considering the (5, 1) repetition code, evaluate the syndrome for: 

All five possible single-error patterns. 

All ten possible double-error patterns. 

In a single-parity-check code , a single parity bit is appended to a block of k message bits (m 0 , m\. 
The single parity bit b 0 is chosen so that the codeword satisfies the even parity rule : 

m 0 + niy + . . . + m k _i + b k _[ = 0, mod 2 

For K = 3, set up the 2 k possible codewords in the code defined by this rule. 
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Compare the parity-check matrix of the (7,4) Hamming code considered in Example 1 with that of a 
(4,1) repetition code. 

Consider the (7,4) Hamming code of Example 1. The generator matrix G and the parity-check 
matrix H of the code are described in that example. Show that these two matrices satisfy the 
condition 

hg t = o 

For the (7,4) Hamming code described in Example 1, construct the eight codewords in 
Hamming’s dual code. 

Find the minimum distance of the dual code determined in part a. 

Linear Cyclic Codes 

For an application that requires error detection only, we may use a nonsystematic code. In this 
problem, we explore the generation of such a cyclic code. Let g(X) denote the generator polynomial, 
and m(X) denote the message polynomial. We define the code polynomial c(X) simply as 

cffl = m(X)g(X) 

Hence, for a given generator polynomial, we may readily determine the codewords in the code. To 
illustrate this procedure, consider the generator polynomial for a (7,4) Hamming code: 

g(X) = 1 + X + X 3 

Determine the 16 codewords in the code, and confirm the nonsystematic nature of the code. 

The polynomial 1 + X 1 has 1 + X + X 3 and 1 + X 2 + X 3 as primitive factors. In Example 10.2, we 
used 1 + X + X 3 as the generator polynomial for a (7,4) Hamming code. In this problem, we consider 
the adoption of 1 + X 2 + X 3 as the generator polynomial. This should lead to a (7,4) Hamming code 
that is different from the code analyzed in Example 2. Develop the encoder and syndrome calculator 
for the generator polynomial: 

g(X) = 1 + X 2 +X 3 

Compare your results with those in Example 2. 

Consider the (7,4) Hamming code defined by the generator polynomial 

g(X) = 1 + X + X 3 

The codeword Oil 1001 is sent over a noisy channel, producing the received word 0101001 that has 
a single error. Determine the syndrome polynomial s(X) for this received word, and show that it is 
identical to the error polynomial e(X). 

The generator polynomial of a (15, 1 1) Hamming code is defined by 

g(X) =\+X + X 4 

Develop the encoder and syndrome calculator for this code, using a systematic form for the code. 

Consider the (15,4) maximal-length code that is the dual of the (15, 11) Hamming code of Problem 
10.14. 

Find the generator polynomial g(X); hence, determine the output sequence assuming the initial state 
0001. Confirm the validity of your result by cycling the initial state through the encoder. 

Consider the (31, 15) Reed-Solomon code. 

How many bits are there in a symbol of the code? 

What is the block length in bits? 

What is the minimum distance of the code? 

How many symbols in error can the code correct? 
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Convolutional Codes 

A convolutional encoder has a single-shift register with two states, (i.e., constraint length v = 3), three 
modulo-2 adders, and an output multiplexer. The generator sequences of the encoder are as follows: 

g (1) = (1, 0, 1) 
g (2) = (1,1,0) 
g (3) = (1,1,1) 

Draw the block diagram of the encoder. 

Consider the rate r = 1/2, constraint length v = 2 convolutional encoder of Figure P10.18. The 
code is systematic. Find the encoder output produced by the message sequence 101 11... 



Figure P10.19 shows the encoder for a rate r = 1/2, constraint length v = 4 convolutional code. 
Determine the encoder output produced by the message sequence 10111 



Consider the encoder of Figure P10.20 for a rate r = 2/3, constraint length v = 2 convolutional code. 
Determine the code sequence produced by the message sequence 10111 

Construct the code tree for the convolutional encoder of Figure P10.19. Trace the path through the 
tree that corresponds to the message sequence 1011 1 . . ., and compare the encoder output with that 
determined in Figure P10.19. 

Construct the trellis graph for the encoder of Figure P10.19, assuming a message sequence of length 

5. Trace the path through the trellis corresponding to the message sequence 10111 Compare the 

resulting encoder output with that found in Problem 10.19. 

Construct the state graph for the encoder of Figure P10.19. Starting with the all-zero state, trace the 
path that corresponds to the message sequence 10111... and compare the resulting code sequence 
with that determined in Problem 10.19. 
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Flip-flop 



Consider the encoder of Figure 10.13. 

Construct the state graph for this encoder. 

Starting from the all-zero state, trace the path that corresponds to the message sequence 10111... 
Compare the resulting sequence with that determined in Problem 10.19. 

By viewing the minimum shift keying (MSK) scheme as a finite-state machine, construct the trellis 
diagram for the MSK. (A description of the MSK is presented in Chapter 7). 

Consider a rate- 1/2, constraint length-7 convolutional code with free distance df ree = 10. Calculate 
the asymptotic coding gain for the following two channels: 

Binary symmetric channel. 

Binary input AWGN channel. 

The transform-domain generator matrix G(D) of an RSC encoder includes ratios of polynomials in 
the delay variable D, whereas, in the case of a nonrecursive convolutional encoder G (£>) is simply a 
polynomial in D. Justify the G(£>) for these two cases. 

Consider an eight-state RSC encoder, the generator matrix of which is given by 

g (D) = \l, 1+D + D2 \ D3 
1 + D + D~ 

where D is the delay variable. 

Construct the block diagram of this encoder. 

Formulate the parity-check equation that embodies all the message as well as parity-check bits in 
the time domain. 

Describe the similarities and differences between traditional encoders and RSC encoders. 

The Viterbi Algorithm 

The trellis diagram of a rate-1/2, constraint length-3 convolutional code is shown in Figure P10.30. 

The all-zero sequence is transmitted and the received sequence is 1000100000 Using the Viterbi 

decoding algorithm, compute the decoded sequence. 

In Section 10.8, we described the Viterbi algorithm for maximum likelihood decoding of a 
convolutional code. Another application of the Viterbi algorithm is for maximum likelihood 
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demodulation of a received sequence corrupted by ISI due to a dispersive channel. Figure P10.31 
shows the trellis diagram for ISI, assuming a binary data sequence. The channel is discrete, 
described by the finite impulse response (1,0, 1 ). The received sequence is (1.0, -0.3, -0.7, . . .). Use 
the Viterbi algorithm to determine the maximum likelihood decoded version of the sequence. 


1.1 1.1 1.1 



In dealing with channel equalization, a primary objective is to undo the convolution performed by a 
linear communication channel on the source signal. This task is well suited for the Viterbi equalizer 
functioning as a channel equalizer. 

What is the underlying idea in the Viterbi algorithm that ties channel equalization and 
convolutional decoding together? 

Suppose that the channel has memory defined by 2 1 , where 1 is an integer. 

What is the required length of the window for the Viterbi equalizer? Justify your answers for both 
parts a and b of the question. 


The MAP Algorithm 

Refer back to (10.92), where 


-expL a (-m ; ) 1 


1 + exp(L a (-m J )) 


j = 0,1, 2, 


Verify that the factor Aj is a constant, regardless of whether the message bit nij is -1 or + 1 . 

In Example 7, we used the max-log-MAP algorithm to decode the three message bits at the output of 
the RSC encoder depicted in Figure 10.23. The computations were obtained using a Matlab code. 
Parts a and b of the figure pertain to the block diagram of the encoder and its trellis, respectively. The 
five computational steps described therein apply equally well to the log-MAP algorithm. 
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Repeat Example 7, but this time develop a Matlab code for using the log-MAP algorithm to 
compute the decoded binary output of the encoder in Figure 10.23a. 

Confirm the decoded binary output produced in part a by performing the five tasks involved in 
the log-MAP algorithm, doing all the computations in the traditional way. 

Compare the decoded output product using the log-MAP algorithm with that reported in 
Example 7. 

Comment on your results. 

Figure P10.35 depicts two processing stages involved in the MAP decoding algorithm. The first stage 
is a convolutional encoder of rate, r = kin, producing the code vector c in response to the message 
vector m. The second stage is a mapper, represented by binary PSK. The signal energy per message 
bit at the encoder input is denoted by £ b ; the noise spectral density of the AWGN channel is N 0 /2. 

Let E s denote the signal energy per symbol transmitted by the binary PSK mapper. Show that the 
SNR measured at the channel output is given by 

(SNR) out = 




Channel 

output 


Consider an AWGN channel with unquantized output, assuming the binary code maps 0 — > -1 and 
1 — > + 1. Given a received signal r-^ 1 at the channel output in response to a transmitted message bit 
nij before decoding, the a posteriori L - value is defined by 


L ( m j\ r j 0) ) = ln | 


f P(w » ; - = + l|r^) 


vP(m- 


Show that 


L ( m j\ij 0) ) 


-—[(r 

% J 


( 0 ) 


1 ) 


, ( 0 ) 
(r) + 


1 ) 


+ ln 


P (nij = +1)\ 
P (m- = -1)7 


where E s is the transmitted signal energy per encoded symbol. 

The channel reliability factor is defined by the following formula, assuming that both mj and rj^ 
are normalized by the factor JW S by 


L c = 4 E/N 


o 


where E s /N 0 is the channel output SNR. Hence, show that 

L ( m j\ r j 0) ) = L c r ]° ) + L Mj) 

where L. d (nij) is the a priori probability of message bit mj. 

In this problem, we expand on Problem 10.36 by considering a binary fading wireless channel, 
where the channel noise is additive, white, and Gaussian. As in Problem 10.36, start with the log- 
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likelihood ratio of a transmitted message bit ntj conditioned on the corresponding matching filtered 
output fj at time-unit j: 


L ( m j\ rj ) 


ln f P( »»./ = +1 i':/h 

VP(my = -1 \rJ)J 


Let a denote the fading amplitude, which distinguishes this problem from Problem 10.36. 
Show that 


where 


and 


L ( m j\ r j) = L Jj + L ( m j) 


L{nij) = in 


= + l) x 
VP/m^ = -1)/ 


L 


C 


^0 


is the modified channel reliability factor. 

For statistically independent transmissions as in dual diversity, show that the log-likelihood ratio 
takes the expanded form: 


L(m 


J J 




L^ + L^ + Limj) 


where [}J * and denote the channel reliability factors for the two simultaneous 

transmissions of bit nij as in dual diversity. Given this result, comment on the benefit gained by 
the use of diversity. 


Let rj 1 = p/q x and r ~ 1 = p/ q 2 be the code rates of RSC encoders 1 and 2 in the turbo encoder 
of Figure 10.26. Find the code rate of the turbo code. 


The feedback nature of the constituent codes in the turbo encoder of Figure 10.26 has the following 
implication: a single bit error corresponds to an infinite sequence of channel errors. Illustrate this 
using a message sequence consisting of symbol 1 followed by an infinite number of symbols 0. 

Consider the following generator matrices for rate-1/2 turbo codes: 


4-state encoder: g (D) 


1 + D + D 2 
’ 1 + D 2 . 


8-state encoder: g (D) 


1 + D 1 + D 3 

’ 1 +d + d 2 + d 3 


16-state encoder: g (D) 


1 + D 

1 + D + D 1 + D 3 + D 4 


Construct the block diagram for each one of these RSC encoders. 

Set up the parity-check equation associated with each encoder. 

Turbo decoding relies on the feedback of extrinsic information. The fundamental principle adhered 
to in the turbo decoder is to avoid feeding a decoding state information that stems from the 
constituent decoder itself. Explain the justification for this principle in conceptual terms. 

Suppose a communication receiver consists of two components: a demodulator and a decoder. The 
demodulator is based on a Markov model of the combined modulator and channel, and the decoder 
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is based on a Markov model of a forward error-correction code. Discuss how the turbo principle may 
be applied to construct a joint demodulator-decoder for this system. 

Summarize the properties/attributes of turbo codes by expanding on the following six issues: 
Structural composition of the turbo encoder and decoder. 

Improvement in the speed of decoding attributed to the two constituent decoders at the expense 
of increased computational complexity. 

Similarity of turbo decoding to the use of feedback in nonlinear control theory. 

Feeding extrinsic information from constituent decoder 1 to constituent decoder 2, back and forth, 
thereby maintaining statistical independence between the bits from one iteration to the next. 
Typical termination of the turbo decoding process after a relatively small number of iterations, 
somewhere in the range of 10 to 20. 

Relatively small degradation in decoding performance of the Max-log-MAP algorithm in the 
order of 0.5 dB, compared with the MAP algorithm. 

Present a comparative evaluation of convolutional codes and turbo codes in terms of the encoding 
and decoding strategies as well as other matters that pertain to signaling over wireless 
communications. Specifically, address the following issues in the comparative evaluation: 

Encoding 

Decoding 

Fading wireless channels 

Latency (i.e., delay incurred in transmission over the channel). 

Referring back to the eight-state Ungerboeck 8-PSK of Figure 10.40, show that the asymptotic 
coding gain of this code is 3.5; see Table 10.8. 


LDPC Codes 

The generator polynomial of the (7, 8) cyclic maximal- length code is given by 

g(X) = l+X + X 2 + X 4 

Show that this code is an LDPC code by constructing its Tanner graph. 

Consider the (7,4) cyclic Hamming code, whose generator polynomial is given by 

g(X) =l+X + X 3 

Construct the Tanner graph of this code, demonstrating that it is another example of an LDPC code. 

The expanded version of the cyclic Hamming code is obtained as follows. If H is parity-check matrix 
of the cyclic Hamming code, then the parity-check matrix of its extended version is defined by 


1 

1 

1 

- 
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whereby the distance between every pair of codewords in the extended code is now even. 
Construct the Tanner graph of the extended cyclic Hamming code (8, 4). 
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In light of the linear cyclic codes considered in Problems 10.46 to 10.48, comment on the 
relationship between this class of codes and LDPC codes. 

In Note 20, we introduced the idea of rateless codes, emphasizing the relationship that exists between 
the new class of codes and LDPC codes. Which features distinguish rateless codes from LDPC codes? 

Develop a list comparing LDPC codes with turbo codes. 

Notes 


1. Feedforward error correction (FEC) relies on the controlled use of redundancy in the transmitted 
codeword for both the detection and correction of errors incurred during the course of transmission 
over a noisy channel. Irrespective of whether the decoding of the received codeword is successful, no 
further processing is performed at the receiver. Accordingly, channel coding techniques suitable for 
FEC require only a one-way link between the transmitter and receiver. 

There is another approach known as automatic-repeat request (ARQ) for solving the error-control 
problem. The underlying philosophy of ARQ is quite different from that of FEC. Specifically, ARQ 
uses redundancy merely for the purpose of error detection. Upon the detection of an error in a 
transmitted codeword, the receiver requests a repeat transmission of the corrupted codeword, which 
necessitates the use of a return path (i.e., a feedback channel from the receiver to the transmitter). 
For a comprehensive treatment of error-control coding, see Lin and Costello (2004) and Moon (2005). 

2. In medicine, the term syndrome is used to describe a pattern of symptoms that aids in the 
diagnosis of a disease. In coding, the error pattern plays the role of the disease and parity-check 
failure that of a symptom. This use of syndrome was coined by Hagelbarger (1959). 

3. The first error-correcting codes, known as Hamming codes, were invented by Hamming at about 
the same time as the conception of information theory by Shannon; for details, see the classic paper 
by Hamming (1950). 

4. Maximal-length codes, also referred to as m-sequences, are discussed further in Appendix J; they 
provide the basis for pseudo-noise (PN) sequences, which play a key role in the study of spread 
spectrum signals in Chapter 9. 

5. Reed-Solomon codes are so named in honor of their originators; see their classic paper (Reed 
and Solomon, 1960). 

The book edited by Wicker and Bhargava (1994) contains an introductory chapter on Reed-Solomon 
codes; a historical overview of the codes written by Reed and Solomon themselves; and chapters on 
the applications of Reed-Solomon codes to exploration of the solar system, the compact disc, 
automatic repeat-request protocols, and spread-spectrum multiple-access communications. 

In a historical context, Reed-Solomon codes are a subclass of the Bose-Chaudhuri and Hocquenghem 
(BCH) codes that represent a large class of powerful random error-correcting cyclic codes. However, it 
is important to recognize that the Reed-Solomon codes were discovered independently of the 
pioneering works by Hocquenghem (1959) and Bose and Ray-Chaudhuri (1960). 

For detailed mathematical treatments of binary BCH codes and nonbinary BCH codes with emphasis 
on Reed-Solomon codes, see Chapters 6 and 7 of the book by Li and Costello (2004), respectively. 

6. Convolutional codes were invented by Elias (1955) as an alternative to linear block codes. The 
aim of that classic paper was to formulate a new class of codes with as much structure as practically 
feasible without loss of performance in using them over binary symmetric and AWGN channels. 

7. In a classic paper, Viterbi (1967) proposed a decoding algorithm for convolutional codes that has 
become known as the Viterbi algorithm. The algorithm was recognized by Forney (1972, 1973) to be 
a maximum likelihood decoder. Readable accounts of the Viterbi algorithm are presented in the book 
by Lin and Costello (2004). 
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The discussion presented in this chapter is confined to the classical Viterbi algorithm involving hard 
decisions. For iterative decoding applications with soft outputs, Hagenauer and Hoeher (1989) 
described the so-called soft-output Viterbi algorithm (SOVA). For detailed discussion of both 
versions of the Viterbi algorithm, the reader is referred to Lin and Costello (2004). 

8. For details of the evaluation of asymptotic coding gain for binary symmetric and binary-input 
AWGN channels, see Lin and Costello (2004). 

9. At first sight, derivation of the MAP decoding algorithm appears to be complicated. In reality, 
however, the derivation is straight forward, given knowledge of probability theory. The derivation 
presented herein follows the book by Lin and Costello (2004). 

10. For detailed mathematical description of the log-MAP algorithm, the reader is referred to the 
book (Lin and Costello, 2004). 

11. Costello and Forney (2007) surveyed the evolution of coding on the road to channel capacity for 
AWGN channels over the course of 50 years, going back to the classic paper of Claude Shannon 
(1948). Proceeding in a stage-by-stage manner through the history of codes over band-limited 
channels, they came to the paper written by Berrou, et al., (1993) on turbo codes, which was 
presented at the IEEE International Communications Conference (ICC) in Geneva, Switzerland; 
therein, the achievement of a performance near the Shannon limit with modest decoding complexity 
was claimed by its three co-authors. Listening to this claim, the coding research community at the 
conference were stunned, with comments being whispered to the effect: “It cannot be true; they must 
have made a 3 dB error.” However, in the course of a year, the claims reported by Berrou were 
confirmed by various laboratories. And, with it, the turbo revolution was launched. 

12. The plots presented in Figure 10.26 follow those in the book by Frey (1998). 

13. Example 9 is based on the Ph.D. thesis by Li (2011), with useful comments by Maunder (2012). 

14. For the case when the interleaver’s length is high, as in the simulation results plotted in the BER 
chart of Figure 10.31, finding the floor region can be extremely time consuming. Indeed, it is for this 
reason that the number of iterations in Figure 10.31 was limited to four. 

15. The averaging method emanated from the Ph.D. thesis of Land (2005); this method is also 
described in Land et al. (2004). The first reference to the averaging method was made under “private 
communication” in Hagenauer (2004). 

16. The LDPC codes, introduced by Gallager (1960, 1963), were dormant for more than three 
decades. Lack of interest in these codes in the 1960s and 1970s may well have been attributed to the 
fact that the computers of those days were not powerful enough to cope with LDPC codes of long 
block lengths. But, reflecting back over the 1980s, it is surprising to find that lack of interest in 
LDPC codes by the coding community persisted for all those years except for a single paper: Tanner 
(1981) proposed a graphical representation for studying the structure of Gallager’s LDPC codes (as 
well as other codes) for the purpose of iterative decoding; such graphs are now called the Tanner 
graphs. In any event, it was not until the introduction of turbo codes and iterative decoding by 
Berrou et al. ( 1 993 ) that interest in LDPC codes was rekindled. Two factors were responsible for this 
rekindled interest (Hanzo, 2012): 

• the protection of turbo codes by a patent and unwillingness of industry to pay royalties, and 

• the rediscovery of LDPC codes by MacKay and Neal (1996; MacKay, 1999). 

And with it, the LDPC rediscovery was launched. 

17. In a historical context, Tanner’s classic paper was also forgotten for well over a decade, until its 
rediscovery by Wiberg (1996) in his seminal thesis. 

18. For a detailed treatment of the statement that the probability distribution of the minimum 
distance of an LDPC code approaches a unit step function of the block length for certain values of 
weight-pair (t c , t T ), see Gallager (1962, 1963). 
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19. The decoding algorithm of LDPC codes described in Section 10.14 follows MacKay and Neal 
(1996, 1997). 

20. The sum-product algorithm (SPA) is a computationally efficient, soft-in soft-out (SISO), iterative 
deciding algorithm based on belief propagation. The notion of belief propagation was originally 
described in Pearl (1988), wherein it was used to study statistical inference in Bayesian networks. For 
a detailed exposition of SPA for the iterative decoding of LDPC codes, see MacKay (1999). 

In a related context, a relationship exists between LDPC codes and a new class of erasure codes 
known as rateless codes, pioneered by Luby (2002). An erasure code is said to be rateless if, ideally, 
it satisfies two requirements: 

• First, encoding symbols are generated in the transmitter from an incoming data stream in on- 
line manner, such that their number is potentially limitless. 

• Second, a decoder in the receiver recovers a replica of the data from an aggregate set of the 
generated encoding symbols, which is only slightly longer than the original data stream. 

Rateless codes are designed for channels without feedback and whose statistics are not known a 
priori. One such channel is the Internet packet switching, where the probability of packet erasure is 
unknown. In any event, rateless codes are basically low-density generator-matrix codes, which are 
decoded using the SPA used to decode LDPC codes; hence the relationship between them. This 
relationship is discussed in detail in Bonello, Chen, and Hanzo (2011 ). 

21. In a historical context, the discovery of irregular LDPC codes was originally spearheaded by 
Luby et al. (1997, 2001), resulting from the substantial efforts that were invested in the development 
of LDPC codes after the onset of the turbo revolution. 

In terms of performance attainable by irregular LDPC codes, Chung et al. (2001) were the first to 
demonstrate that several very long rate- 1/2 irregular LDPC codes for AWGN channels could be 
designed to approach the Shannon limit within 0.0045 dB, which is truly remarkable. 

22. Trellis-coded modulation was invented by Ungerboeck (1982); its historical evolution is 
described in Ungerboeck (1987). Table 10.8 is adapted from this latter paper. 

Trellis-coded modulation may be viewed as a form of signal-space coding — a viewpoint discussed 
at an introductory level in Chapter 14 of the book by Lee and Messerschmitt (1994). For an 
extensive treatment of trellis-coded modulation, see the books by Schlegel (1997) and Lin and 
Costello (2004: 875-880). 

23. A concatenated coding scheme using trellis-coded modulation first appeared in Robertson 
and Worz (1998), appropriately dubbed Turbo TCM using a parallel concatenation scheme, and has 
met with further refinements in Hanzo et al. (2003), Koca and Levy (2004), and Sun et al. (2004). 
The serial concatenation scheme can likewise apply, in which the outer encoder is still a recursive 
systematic encoder, while the inner encoder implements an Ungerboeck code for modulating the 
symbols to be sent over the communication channel. As the Ungerboeck code imposes a trellis 
structure, the inner decoder may be implemented with the MAP algorithm to obtain the bitwise a 
posteriori probabilities; the extrinsic information extraction from this inner decoder follows the 
same steps as in Section 10.16, and the coupling of decoders as per Figure 10.43 carries over 
immediately. 

24. For turbo equalization and related issues, see Douillard et al. (1995); Supnithi et al. (2003); Jiang 
et al. (2004); Kotter et al. (2004); Rad and Moon (2005); Lopes and Barry (2006); Regalia (2010). 

25. For turbo CDMA, see the papers by Alexander et al. (1999) and Wang and Poor (1999). 

The topic of DS-CDMA was discussed in Chapter 9. 

26. In formulating (10.157), we have introduced log posterior ratio and log prior ratio so as to 
avoid confusion with the traditional log likelihood ratio, particularly so when the ratio of interest in 
this section is not always between likelihood function evaluations. 
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In the study of digital communications presented in preceding chapters, the Gaussian, 
Rayleigh, and Rician distributions featured in the formulation of probabilistic models in 
varying degrees. In this appendix we describe three relatively advanced distributions: 

• the chi distribution; 

• the log-normal distribution; 

• the Nakagami distribution. 

The chi distribution is featured in the study of diversity-on-receive techniques in Chapter 9 
on signaling across fading channels. Just as importantly, the log-normal distribution was 
mentioned in passing in the context of shadowing in wireless communications, also in 
Chapter 9. The Nakagami distribution is the most advanced of all the three: 

• it includes the Rayleigh distribution as a special case; 

• its shape is similar to the Rician distribution; 

• it is flexible in its applicability. 

The Chi-Square Distribution 


2 

A chi-square % distributed random variable is produced, for example, when a Gaussian 
random variable is passed through a squaring device. Viewed in this manner, there are two 
kinds of % distributions: 

2 

Central % distribution, which is produced when the Gaussian random variable has 
zero mean. 

2 

Noncentral % distribution, which is produced when the Gaussian random variable 
has a nonzero mean. 


In this appendix, we will discuss only the central form of the distribution. 

Consider, then, a standard Gaussian random variable X, which has zero mean and unit 
variance, as shown by 

f v (x) = expf ) , -co < x < co 

X J2n V 2 )' 


Let the variable X be applied to a square-law device, producing a new random variable Y, 
whose sample value is defined by 


or, equivalently, 


Y = x 
x = ±Jy 


The cumulative distribution function of the random variable Y produced at the output of 
the square-law device is therefore defined by 


v (y) = f x ( x ) dx 

-Jy 


A1 


A2 
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Differentiating Ffy ) with respect to y yields the probability density function (pdf): 

fy( - v) = sy(*/y-/xCo) 


fy( Jy) +fv(-Jy) 


2jy 


Substituting (A.l) into (A. 5), we get 


fviy) = 


1 


2«jy\-j2n 

1 

J2ny 


exp 




1 


exp 


exp - 


2J’ 


J2n 

0 < y < oo 


The distribution described in (A. 6) is called the chi-square (x ) distribution with one 
degree of freedom. 

The first two moments of Y are given by 

E[T] = 1 
E[T 2 ] = 3 

and its variance is 

var[T] = 2 

Note, however, that these values are based on the standard Gaussian distribution with zero 
mean and unit variance. For the general case of an ordinary Gaussian distribution with 
zero mean and variance cr 2 , the mean, mean-square value, and variance of the X 2 random 
variable Y are, respectively, as follows: 

E[T] = cr 
E[F 2 ] = 3cr 4 
var [Y] = 2cr 4 

In its most general setting, derivation of the chi-square distribution follows from a set of 
iid random variables denoted by {X } ; = i , on the basis of which a new random variable 
is defined as follows: 


Y = 


X; 


- I”* 

i = 1 

On this basis, the pdf of the random variable Y is defined by 

(ra/2) - 1 


My) 


2 u/2 y ' n 


exp 


2<j 


0, 


y > 0 


otherwise 
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y 

The chi-square distribution for varying order n. 


where F(/,) is Euler’s gamma function, defined by (Abramowitz and Stegun, 1965) 

F(/l) = [ t exp(-f) df 

J 0 

As such, the random variable Y is said to have the chi-square distribution of order n. 
When n= 1, E(l/2) = Jn, and we get the special case described in (A.6); this special 
case of the % distribution is also referred to as the one-sided exponential distribution. 
Figure A.l plots the % distribution for varying orders: n = 1, 2, 3, 4, 5. 

The Log-Normal Distribution 

To proceed next with the log-normal distribution, let X and The two random variables that 
are related to each other through the logarithmic transformation 

Y = In (A) 

where In is the natural logarithm. Conversely, we have 

X = exp(T) 

In light of this logarithmic transformation, the random variable X is said to be log- 
normally distributed if the other random variable Y is normally (i.e., Gaussian) distributed. 
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Assuming that the Gaussian-distributed Y has nonzero mean ft y and variance a Y , then a 
straightforward transformation based on (A.l 1) yields the log-normal distribution: 


x > 0 



1 

(In (x)-fi Y ) 2 

f x ( x ) = ■ 

FT 6XP 

l C TyX 

0, 

0 2 

L 2 ay J 


otherwise 


By the same token, a probability model based on the log-normal distribution of (A. 12) is 
called the log-normal model. 

Unlike the chi-square distribution, the log-normal distribution has two adjustable 
parameters of its own, the nonzero mean fi Y and variance ay , both of which are inherited 
from the Gaussian distributed random variable Y. Note also that the mean and variance of 
the log-normally distributed random variable X, represented by the sample value x in 
(A. 12), are respectively different from the exponential functions of ft Y and ay. 

As already noted, the log-normal distribution of (A. 12) is derived via the logarithmic 
transformation of a Gaussian-distribution. Recognizing that power plays a key role in the 
study of communications, there is special merit in introducing a new random variable 
related to X: 


z = 101og 10 (x) 


which is measured in decibels. Conversely, x is expressed in terms of z as follows: 


x = 


10 


z/10 


Hence, using (A. 14) in (A. 13), we get 

Y = cZ 

where the constant is 

c = In ( 10) 
10 


Equation (A. 15) shows that both Y and Z are Gaussian distributed, differing by the scaling 
factor c. 

Accordingly, the mean and variance of the Gaussian-distributed random variable Z are 
respectively defined by 


_ 1 

A Z - C ^Y’ 

Equivalently, we may write 

fly — Cflg, 


2 

7 z 


1 2 


CT 7 — ? CTy 


2 2 
c av 


To visualize the log-normal distribution defined in (A. 12), we propose to proceed as 
follows: 

The mean fi Y is maintained at the constant value, ft y = 0 dB . 

The standard deviation ay (that is, the square root of the variance ay) is assigned 
three different values: ay = 1, 5, 10 dB. 

With decibel as the logarithmic measure of interest, the new variable x in the log-normal 
distribution of (A. 12) is also measured in decibels. Thus, using the assigned values of ft y 
and ay under points (1) and (2) in (A. 12), we get the plots displayed in Figure A. 2. 
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Examining Figure A. 2, we make two observations that are of particular interest: 

The log-normal distribution exhibits long tails for cr z >6 dB; hence its 
appropriateness as a model for the shadow-fading phenomenon in wireless 
communications. From a practical perspective, a standard deviation lying in the 
range 6 < a z < 8 dB is typical for shadowing, in which case we see that the 
distribution of Figure A. 2 is quite asymmetric with a small "modal” value. In other 
words, 6 < a z < 8 dB is the mode or the most likely range of shadowing. 

When the standard deviation a z is reduced below this range, the log-normal 
distribution tends to become more symmetric and, therefore, Gaussian, centered 
roughly around x = 1 dB. 


Over and above having the characteristic of long tails, the log-normal distribution has two 
other useful properties: 


This property follows from the fact that the exponents of the random variable Y or Z add 
(or subtract). Since the exponents are Gaussian distributed, they remain Gaussian after the 
addition (or subtraction); hence the validity of Property 1. 



The log-normal distribution. 
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As a corollary to this property, we may also state: 


This property is the counterpart of the central limit theorem, involving the addition of a 
large number of iid random variables. The reason for this second property is rather 
obvious for two reasons: 

• First, the example of the random variables involved in forming the product add. 

• Second, applying the central limit theorem to the addition of the example, the result 
asymptotically converges to a Gaussian distribution; hence the validity of Property 2. 

The Nakagami Distribution 


As different as the distributions covered until this point are, namely the Rayleigh and 
Rician distributions derived in Chapter 4, as well as the chi-square and log-normal 
distributions derived in this appendix, all four of them share a common factor: 


In the last part of this appendix we describe another distribution, namely the Nakagami 
distribution, which is different from all the others in the following sense: 


Indeed, it is for this important reason (and a few others that will be discussed) that the 
Nakagami distribution is commonly used as a model for wireless communications. 

To be specific, a random variable X whose pdf is described by the equation 


f x ( x ) = 


2 

f mY" 2;n - 1 

f TO 2) 

T(to) 

0, 

vqJ x 

exp l n x ) 


x>0 

otherwise 


is said to have the Nakagami-m distribution. The random variable X is itself referred to as 
a Nakagami-distributed random variable (Nakagami, 1960). 

The two parameters that characterize this distribution are defined as follows: 


The parameter Q , which is the mean-square value of the random variable X\ that is, 

Q = E[X 2 ] 

The second parameter, to, called the fading figure, is defined by the ratio: 

c 2 

TO 


Q 


E[(X 2 -Q) 2 ] 


E[X 2 ] , . 1 

■ — 9 , tor m > - 


E[(A -E[Z-]) 
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Note the restriction that is placed on m for (A. 21) to hold. Close examination of the 
definitions embodied in (A. 20) and (A.21) reveals that the statistical characterization of 
the fading figure m involves two moments: 

• the mean-square value of the random variable X in the numerator and 

• the variance of the squared random variable X in the denominator. 

It follows, therefore, that the fading figure m is dimensionless. 

For visualization, the Nakagami-/;? distribution is plotted in Figure A. 3 for varying 
values of m. Two observations from these plots are noteworthy: 

For /// = 1/2, the Nakagami-/// distribution reduces to the Rayleigh distribution; in 
other words: 


The Nakagami and Rician distributions have a similar shape. 


To elaborate on point 2, for m > 1 we find that the fading figure in can be computed from 
the dimensionless Rice factor K (discussed in Chapter 4), as shown in (Stiiber, 1996): 


m 


(JL± D 2 

2 K+ 1 



0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 


X 

The Nakagami-m distribution, presenting theoretical and simulation results for 
varying fading figure m. 
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Conversely, 

2 1/2 
K = im ~ m ) 

2 1/2 
m - (m -m) 

A cautionary note is in order, however. Although the Nakagami-/// and Rician distributions 
appear to have good agreement insofar as their shapes are concerned, they have different 
slopes at the origin, x = 0; this difference has a significant impact on the achievable 
diversity, with the advantage residing in the Nakagami distribution (Molisch, 201 1). 

From a practical perspective, the Nakagami-/;; distribution has the following attributes, 
in accordance with (A. 20) and (A.21): 


This succinct statement re-emphasizes the point we made at the beginning of this 
subsection: 



A set of sample functions of log-nonnal distribution and its approximation 
with the Nakagami distribution as the fading figure m is increased. 
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Indeed, with this important point in mind, the plots presented in Figure A. 3 actually 
include points (denoted by crosses) that pertain to an arbitrarily selected wireless data. 

Figure A. 4 provides further demonstration of the inherent flexibility of the Nakagami- 
/?/ distribution in approximating the log-normal distribution. It is clearly shown that the 
approximation gets gradually better as the fading figure m is increased. 

It is not surprising, therefore, to find that the Nakagami-/?? distribution outperforms the 
Rayleigh and Rician distributions, particularly so in urban wireless communication 
environments. 

Notes 


1. The visualization procedure described herein for the log-normal distribution follows Cavers 

( 2000 ). 

Two other procedures for visualizing the log-normal distribution are described in the literature, as 
summarized here: 

• In Proakis and Salehi (2008), the standard deviation ay = 1 and the mean //y are varied, 
with both fj. Y and ay measured in volts. 

• In Goldstein (2005), a new random variable W defined as the ratio of transmit-to-receive 
power, is used in place of x, and a new formula for the log-normal distribution is derived. In so 
doing, the use of power measured in decibels plays a prominent role in a new formulation of 
the log-normal distribution. However, this new formulation takes values for 0 < y/< co , which 
raises a physically unacceptable scenario; specifically, for y/< 1, the receive-power assumes a 
value greater than the transmit-power. 

• Fortunately, the probability of this unacceptable scenario arising is very small, provided that 
the mean //^, expressed in decibels, is positive and large. It is thus claimed that the log- 
normal model based on the random variable y/ captures the underlying physical model very 
accurately when the mean is very large compared to 0 dB. 

2. The properties of the log-normal distribution described herein follow Cavers (2000). 

3. The procedure used to compute the simulated points in the plots presented in Figure A. 3 follows 
Matthaiou and Laurenson (2007). 

4. This note provides additional noteworthy material on the Nakagami-/?? distribution. In Turin et al. 
(1972) and Suzuki (1977), it is demonstrated that the Nakagami-//? distribution provides the best 
statistical fit to measured data in urban wireless environments. 

Two other papers of interest are Braun and Dersch (1991), in which a physical interpretation of the 
Nakagami-//? distribution is presented, and Abdi et al., (2000), in which the statistical characteristics 
of the Nakagami and Rician distributions are summarized. 

Moreover, there are three other papers on the Nakagami distribution that deserve attention. Given a 
set of real-life fading-channel data, various papers have been published on how to estimate the 
parameter /?/ in the Nakagami model. In Zhang (2002), numerical results are presented to show that 
none of the previously published results exceed the classical one by Greenwood and Durand (1960). 
The correlated Rayleigh fading lends itself readily to simulate a fading channel by virtue of its 
relationship to a complex Gaussian process. Unfortunately, this is not so with the Nakagami 
distribution. In Zhang (2000), a decomposition technique is described for the efficient generation of 
a correlated Nakagami fading channel. 

In Zhang (2003), a generic correlated Nakagami-/// model is described using a multiple joint 
characteristic function, which allows for an arbitrary covariance matrix and distinct real fading 
parameters. 



Bounds on the Q-Function 


Following Chapter 3, we define the (9-function as 

SW = Ifn dI 

which represents the area under the tail of the standard Gaussian distribution. In this 
appendix, we derive some useful bounds on the (9-function for large positive values of x. 
To this end, we change the variable of integration in (B.l) by setting 

z = x-t 

and then recast (B.l) in the form 

Q(x) = -^exp(-y)j exp(xz)exp(-|z 2 ) dz 
2 . 

For any real z, the value of exp(-l/2z ) lies between the successive partial sums of the 
power series: 

2 2 2 2 3 

l z/2 , (z V2) (zV2) , 

1! 2! 3! 

Therefore, for x > 0 we find that, on using in + I ) terms of this series, the (9-function lies 
between the values taken by the integral 


Jin 


r 2 \ 

F 7 9 2 2 n ~l 

[ (z /2) (z 2 / 2) + (Z /2) 


1! 2! n\ 


exp(xz) dz 


for even n and odd n. We now make another change in the integration variable by setting 

V = —xz 

and also use the definite integral 


[ v"exp(-v) dv = n\ 

J 0 

Doing so, we obtain the following asymptotic expansion for the (2-function, assuming that 
x > 0: 


Q(x) 


2 

exp(-x /2) 
J2nx 



1 X 3 


4 

X 


1x3 X 5---(2»- 1) ~ 

— 2 n 

X J 


For large positive values of x, the successive terms of the series on the right-hand side of 
(B.6) decrease very rapidly. We thus deduce two simple bounds on the (9-function, one 
lower and the other upper, as shown by 

exp(-x 2 /2V _ 1 2 . nf , , exp(-x 2 /2) 

Jinx ^ x ’ J2nx 


All 
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For large positive x, a second bound on the (^-function is obtained by simply ignoring the 
multiplying factor 1 lx in the upper bound of (B.7), in which case we write 

Q(x) < —p= expf-^-1 

J2n v 2^ 

Figure B.l contains plots of the following quantities: 

• tabulated values of the Q-fimction presented in Table 3.1; 

• the lower and upper bounds of (B.7); 

• the upper bound of (B.8). 



Lower and upper bounds on the <2-function. 


Bessel Functions 


Series Solution of Bessel’s Equation 


In a certain class of differential or difference equations encountered in many branches of 
science and engineering, Bessel functions and their modified versions feature commonly 
in their solutions, just as cosines and sines feature commonly in trigonometry. 

For example, in spectral analysis of analog frequency-modulated (FM) signals 
(discussed briefly in Chapter 2), the analysis involves the use of Bessel functions of 
infinite order; see Haykin (2001) for details of this analysis. For yet another example, in 
studying the Jakes FIR model in Chapter 9 on signaling over fading channels, we found 
that the Bessel functions of zero order featured in the autocorrelation function at the input 
of the mobile receiver. Then, in Chapter 7 on signaling over AWGN channels, the 
modified Bessel function of zero order featured in arriving at the nondata-aided recursive 
algorithm for symbol timing in the receiver. 

These motivating examples prompt us to devote this appendix to mathematical analysis 
of Bessel functions and their modified versions. 

In its most basic form, Bessel’s equation of order n is written as 

2 

2d y dy , 2 2. „ 

x — ^ + x-f- + (x -n )y = 0 

d, 2 d * 

which is one of the most important of all variable-coefficient differential equations. For 
each n, a solution of this equation is defined by the power series 


■W = 


y (-lAl/2)' i + 2m 
m\(n + m ) ! 

m = 0 


The function J n (x) is called a Bessel function of the first kind of order n. Equation (C.l) 
has two coefficient functions to deal with: l/x and (x - n 2 /x 2 ). Hence, it has no finite 
singular points except for the origin. It follows, therefore, that the series expansion of 
(C.2) converges for all x > 0. This equation may thus be used to numerically calculate 
J n (x) for n = 0,1,2,.... Table C. 1 gives values of Jf x) for different order n and varying x. 

The function J n (x) may also be expressed in the form of an integral as 


or, equivalently, 
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Properties of the Bessel Function 


The Bessel function JJx ) has the following properties: 


JJx) = (-1 ) n J_ n {x) 


To prove this relation, we replace Oby (n - 6) in (C.3). Then, noting that s i n ( rr - 0) 
is equal to sin 0 \ we get 



[cos(«7f)cos(xsin# + nO) + sin(H7i;)sin(xsin# + n9)~\ AO 

n J 0 


For integer values of n, we have 


Therefore, 


cos(mt) = (-1)" 
sin(wjt) = 0 


(- 1 ) r K 

JJx) = - — —I cos (x sin n9) &9 
7t J Q 

From (C.3), we also find that by replacing n with -n: 

1 r K 

JJx) = - cos (xsin 6 + n6) d 6 

7tJ 0 

The desired result follows immediately from (C.6) and (C.7). 


JJx) = (-1 )"JJ-x) 

This relation is obtained by replacing x with -x in (C.3), and then using (C.6). 

J n -Jx)+J n+l (x) = ^ J Jx ) 

This recurrence formula is useful in constructing tables of Bessel coefficients; its 
derivation follows from the power series of (C.2). 

For small values of x, we have 


This relation is obtained simply by retaining the first term in the power series of 
(C.2) and ignoring the higher order terms. Thus, when x is small, we have 

JJx) « 1 
JJx)* X - 
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J n (x) * 0 for n > 1 


For large values of x, we have 



This property shows that, for large values of x, the Bessel function J n (x) behaves like 
a sine wave with progressively decreasing amplitude. 

With x real and fixed, J n (x) approaches zero as the order n goes to infinity. 


00 

X J n (x)zKV(]n<j>) = exp (jx sin tf>) 


n = -oo 

To prove this property, consider the sum (x)exp(jn$) and use (C.4) for 

Z„(x) to obtain 

00 1 CO Jt 

X Z„(x)exp(j n<j>) = — y exp(jn (z))J exp(jxsin<9-j«#) d 6 

n = -oo n — -oo L 

Interchanging the order of integration and summation: 

X Z n (x) exp (j/i{zf) = — J d6>exp(jxsin6>) y ex P[j«(^-#)] 

— K 

n = -oo n = -oo 

We now invoke the following relation from Fourier transform theory: 

1 GO 

# </>) = ^ X ex P[J"(^’ -n<<t><n 

n = -oo 


where 8{<j>) is the delta function. Therefore, using (C.15) in (C.14) and then applying 
the sifting property of the delta function, we get 

OO n 

y Z ;! (x)exp(jn^) = J exp(jxsin 6)8(<l)-6) d 8 

-K 

n = -oo 

= exp (jx sin <j>) 

which is the desired result. 


y j\{x) = 1 for all x 

n = -oo 

To prove this property, we may proceed as follows. We observe that J n (x) is real; 
hence, multiplying (C.4) by its own complex conjugate and summing over all 
possible values of n, we get 

“ 2 1 00 r K r n 

y Z~(x) = X J J exp(jxsin#-j«<9-jxsin^ + jn^) &6 &<j) 


n = -00 


n = -00 
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Interchanging the order of double integration and summation: 


00 



d6 d<j> exp[jx( sin 6- sin 


00 

<*)] ^ exp[j«(^- 0)] 

n = -oo 


Using (C.15) in (C.17) and then applying the sifting property of the delta function, 
we finally get 


00 

X w 

n = -oo 

which is the desired result. 
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Modified Bessel Function 


Consider the modified Bessel equation : 


2d~y dy 
d, 2 d * 


x — '- + x-f - (x +n)y = 0 


With j~ = -1, we may rewrite this equation as 


2d”y dv ,.2 2 2. „ 

x — + x-f- + (j x - n )y = 0 

dr 2 dx 


from which it is therefore evident that (C.18) is nothing but Bessel’s equation, namely 
(C. 1 ), rewritten with x replaced by jx. Thus, replacing x by jx in (C.2) and again noting that 
-1 =j 2 , we get 


J n O) = 


00 / 1 \ m z ■ /Ti»+2m 

y (-1) (]x/2) 

“ m\(n + m)\ 

m = 0 


• n ^ t 

= J X 


x/2 


n+2m 


m = 0 


m\(n + m)\ 


Next we note that J n { jx) multiplied by a constant will still be a solution of Bessel’s 
equation. Accordingly, we multiply J H (jx) by the constant j~ n , obtaining 


00 

f n J n (]x) = X 


( l/2x) n + 2m 
ml(n + m)\ 


m = 0 


This new function is called the modified Bessel function of the first kind of order n, 
denoted by I n (x). We may thus formally express a solution of the modified Bessel equation 
(C.18) as 


A.W = j Uj n( j*) 


00 /o \n + 2m 

Z (l/2x) 

m\(n + m)\ 

m = 0 
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The modified Bessel function I n (x ) is a monotonically increasing real function of the 
argument x > 0 for all n, as shown in Figure C. 1 for n = 0, 1 . 

The modified Bessel function I n (x) is identical to the original Bessel function JJx) 
except for an important difference: 


The relationship between J n (x) and I n (x) is analogous to the way in which the 
trigonometric functions cos x and sin x are related to the hyperbolic functions cosh x and 
sinh x, respectively. 

An interesting property of the modified Bessel function 7„(x ) is derived from (C.13). 
Specifically by replacing x by jx and the angle ^ by 0 - Jt/2 in this equation and then 
invoking the definition of I n (x) in the first line of (C.19), we obtain 

00 

f n (x)exp(jnff) = exp(xcos$) 

n = -oo 

From this relation it follows that 

1 r 71 

7 (x) = — exp(xcos#)cos(n<9) d(9 

2nJ -K 

This integral formula for I n (x) may, of course, also be derived from (C.4) by making the 
appropriate changes. 

When the argument x is small, we obtain the following asymptotic estimates directly 
from the series representation of (C.19): 

/q(x) — » 1 for x — > 0 



-4 -3 -2 -1 0 1 2 3 4 

X 

Plots of modified Bessel functions of the first kind 7 0 (x) and 7j(x). 
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and 


I n (x) — > 0 for n > 1 and x — » 0 

For large values of x we have the following asymptotic estimate for I n (x), which is valid 
for all integers n > 0: 


IJx) * £Xp(A ' ) for x — > oo 
Jinx 

Note that this asymptotic behavior of l n (x) is independent of the order n for large values of x. 


In numerical terms, Table C.l provides a limited set of values of the Bessel function J(x) 
and modified Bessel function I{x). More extensive tables of these two functions are given 
in Abramowitz and Stegun (1965). 

Values of Bessel functions and modified 
Bessel functions of the first kind 


0.00 

1.0000 

0.0000 

1.0000 

0.0000 

0.20 

0.9900 

0.0995 

1.0100 

0.1005 

0.40 

0.9604 

0.1960 

1.0404 

0.2040 

0.60 

0.9120 

0.2867 

1.0920 

0.3137 

0.80 

0.8463 

0.3688 

1.1665 

0.4329 

1.00 

0.7652 

0.4401 

1.2661 

0.5652 

1.20 

0.6711 

0.4983 

1.3937 

0.7147 

1.40 

0.5669 

0.5419 

1.5534 

0.8861 

1.60 

0.4554 

0.5699 

1.7500 

1.0848 

1.80 

0.3400 

0.5815 

1.9896 

1.3172 

2.00 

0.2239 

0.5767 

1.1796 

1.5906 

2.20 

0.1104 

0.5560 

2.6291 

1.9141 

2.40 

0.0025 

0.5202 

3.0493 

2.2981 

2.60 

-0.0968 

0.4708 

3.5533 

2.7554 

2.80 

-0.1850 

0.4097 

4.1573 

3.3011 

3.00 

-0.2601 

0.3391 

4.8808 

3.9534 

3.20 

-0.3202 

0.2613 

5.7472 

4.7343 

3.40 

-0.3643 

0.1792 

6.7848 

5.6701 

3.60 

-0.3918 

0.0955 

8.0277 

6.7927 

3.80 

-0.4026 

0.0128 

9.5169 

8.1404 

4.00 

-0.3971 

-0.0660 

11.3019 

9.7595 


Notes 


1. Equation (C.l) is named after the German mathematician and astronomer Bessel. For detailed 
treatments of the solution to this equation and related issues, see the books by Wylie and Barrett 
(1982) and Watson (1966). 


Method of Lagrange Multipliers 

Optimization Involving a Single Equality Constraint 


Consider the minimization of a real- valued function /(w) that is a quadratic function of a 
parameter vector w, subject to the constraint 

wts = g 

where s is a prescribed vector and g is a complex constant; the superscript denotes 
Hermitian transposition. We may redefine the constraint by introducing a new function 
c( w) that is linear in w, as shown by 

c(w) = w^s - g 
= 0+jO 

In general, the vectors w and s and the function c(w) are all complex. For example, in a 
beamforming application, the vector w represents a set of complex weights applied to the 
individual sensor outputs and s represents a steering vector whose elements are defined by 
a prescribed "look” direction; the function /( w) to be minimized represents the mean- 
square value of the overall beamformer output. In a harmonic retrieval application, for 
another example, w represents the tap-weight vector of an FIR filter and s represents a 
sinusoidal vector whose elements are determined by the angular frequency of a complex 
sinusoid contained in the filter input; the function /(w) represents the mean-square value 
of the filter output. In any event, assuming that the issue is one of minimization, we may 
state the constrained optimization problem as follows: 

The method of Lagrange multipliers converts the problem of constrained minimization 
just described into one of unconstrained minimization by the introduction of Lagrange 
multipliers. First, we use the real function /(w) and the complex constraint function c(w) 
to define a new real- valued function 

h( w) = /( w) + ljRe[c(w)] + /t, 2 Im[c(w)] 
where A\ and ^ are real Lagrange multipliers and 

c(w) = Re[c(w)] + jlm[c(w)] 

Now we define a complex Lagrange multiplier: 

A — A^ + j ^2 

The Re[] and Im[-] in (D.4) and (D.5) denote real and imaginary operators, respectively. 
We may then rewrite (D.4) in the form 

h( w) = /(w) + Re[T*c(w)] 
where the asterisk denotes complex conjugation. 
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Next, we minimize the function /;(w) with respect to the vector w. To do this, we set the 
conjugate derivative dh/ (5w*) equal to the null vector: 

^ + -£. ( R e[iM „ ) ] ) = 0 

The system of simultaneous equations consisting of (D.8) and the original constraint given 
in (D.2) defines the optimum solutions for the vector w and the Lagrange multiplier X. We 
call (D.8) the adjoint equation and (D.2) the primal equation (Dorny, 1975). 


Information Capacity of MIMO Channels 


The topic of multiple-input multiple-output (MIMO) links for wireless communications 
was discussed in Chapter 9 on signaling over fading channels. To get a measure of the 
transmission efficiency of MIMO links therein, we resorted to the notion of outage 
capacity, which is naturally of practical interest. However, in light of its mathematical 
sophistication, we deferred discussion of the information capacity of MIMO links rooted 
in Shannon’s information theory to this appendix. 

To be specific, in this appendix we discuss two different aspects of information 
capacity: 

The channel state is known to the receiver but not the transmitter; 

The channel state is known to both the receiver and the transmitter. 

The discussion will proceed in this order. 


Log-Det Capacity Formula of MIMO Channels 


Consider a communication channel with multiple antennas. Let the (V t -by-l vector s 
denote the transmitted signal vector and the /V r -by- 1 vector x denote the received signal 
vector. These two vectors are related by the input-output relation of the channel: 

x = Hs + w 

where H is the channel matrix of the link and w is the additive channel noise vector. The 
vectors s, w, and x are realizations of the random vectors S, W, and X, respectively. 

In what follows in this appendix, the following assumptions are made: 

The channel is stationary and ergodic. 

The channel matrix H is made up of iid Gaussian elements. 

The transmitted signal vector s has zero mean and correlation matrix R s . 

The additive channel noise vector w has zero mean and correlation matrix R w . 

Both s and w are governed by Gaussian distributions. 

In this section, we also assume that the channel state H is known to the receiver but not the 
transmitter. With both H and x unknown to the transmitter, the primary issue of interest is 
to determine 7(s;x,H), which denotes the mutual information between the transmitted 
signal vector s and both the received signal vector x and the channel matrix H. Extending 
the definition of mutual information introduced in Chapter 5 to the problem at hand, we 
write 


7(S;X, H) 


m/s,X,H(s, x, H)log 

owqfdp 
oL dLO 


2 l fx, h( X ’ H ) / 


ds dx dH 


where if % and Hi are the respective spaces pertaining to the random vectors S and X and 
matrix H. 
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Using the definition of a joint probability density function (pdf) as the product of a 
conditional pdf and an ordinary pdf, we write 

fs, X, h( S ’ x ’ = fs, X|h( S ’ x |H)/h(H) 

We may therefore rewrite (E.2) in the equivalent form 


7(S;X, H) = J/ h (H) JJ/ S X|H (s,x|H)log 


/six, h( s I X ’ H ) 


L Xf 


= E 


H 


JJ/s,X|h( S ’ X I H ) 1o §, 


V fx, h( X ’ H ) 
'/’six, h( s I X ’ h ) 


ds dx 


dH 


= E h [/(s;x|H)] 


/x,h( x -H) 


ds dx 


where the expectation is with respect to the channel matrix H and 


7(s;x|H) = J|4 X|H (s,x|H)log 

VKf 


/s|X, h( s I X ’ H / | 

2I fx, h( X ’ h ) J 


ds dx 


is the conditional mutual information between the transmitted signal vector s and received 
signal vector x, given the channel matrix H. However, by assumption, the channel state is 
unknown to the transmitter. Therefore, it follows that, insofar as the receiver is concerned, 
7(s;x|H) is a random vector; hence the expectation with respect to H in (E.3). The quantity 
resulting from this expectation is therefore deterministic, defining the mutual information 
jointly between the transmitted signal vector s and both the received signal vector x and 
channel matrix H. The result so obtained is indeed consistent with what we know about 
the notion of joint mutual information. 

Next, applying the vector form of the first line in (5.81) to the mutual information 
7(s;x|H), we have 

7(s;x|H) = /i(x|H) - h(x\s, H) 

where /7(x|H) is the conditional differential entropy of the channel output x given H, and 
/;(x|s,H) is the conditional differential entropy of x, given both s and H. Both of these 
entropies are random quantities, because they both depend on H. 

To proceed further, we now invoke the assumed Gaussianity of both s and H, in which 
case x also assumes a Gaussian description. Under these circumstances, we may use the 
result of Problem 5.32 to express the entropy of the received signal x of dimension N r , 
given H, as 


7?(x|H) = N r + N r log 2 (27i) + log 2 (det(R x j) bits 


where R x is the correlation matrix of x and det(R x ) is its determinant. Recognizing that the 
transmitted signal vector s and channel noise vector w are independent of each other, we 
find from (E. 1 ) that the correlation matrix of the received signal vector x is given by 


Log-Det Capacity Formula of MIMO Channels 
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R x = [xxt] 

= E[(Hs + w)(Hs + w)t] 

= E[(Hs + w)(stHt + wt)] 

= E[Hss 1 'Ht] + E[ww’ ! '], (E[swt] = 0) 
= HE[sst]H* + R w 


is the correlation matrix of the channel noise vector w. Hence, using (E.6) in (E.5), we get 
h(x |H) = iV r + iV r log 2 (27i) + log 2 {det(R w + HR s Ht)} bits 

where /V r is the number of elements in the receiving antenna. Next, we note that since the 
vectors s and w are independent and the sum of w plus Hs equals x as indicated in (E.l), 
then the conditional differential entropy of x, given both s and H, is simply equal to the 
differential entropy of the additive channel noise vector w; that is, 

/?(x|s, H) = /7(w) 

The entropy /?(w) is given by (see Problem 5.32) 

h{ w) = N t + N t log 2 (27t) + log 2 {det(R w )} bits 

Thus, using (E.9), (E.10), and (E.l 1) in (E.4), we get 

/(s;x|H) = log 2 {det(R w + HR s Ht)}-log 2 {det(R w )} 


As remarked previously, the conditional mutual information 7(s;x|H) is a random variable. 
Hence, using (E.l 2) in (E.3), we finally formulate the ergodic capacity of the MIMO link 
as the expectation 


= hr s hur w 

where t denotes Hermitian transposition, 

R s = E[sst] 


is the correlation matrix of the transmitted signal vector s, and 


R w = E[wwt] 



(det(R w + HR s Ht)} 
{det(R w )} 



which is subject to the constraint 


max tr[R s ] < P 

R„ 
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where P is constant transmit power and tr[.] denotes the trace operator, which extracts the 
sum of the diagonal elements of the enclosed matrix. 

Equation (E.13) is the desired log-det formula for the ergodic capacity of the MIMO 
link. This formula is of general applicability, in that correlations among the elements of 
the transmitted signal vector s and among those of the channel noise vector w are 
permitted. However, the assumptions made in its derivation involve the Gaussianity of s, 
H, and w. 

MIMO Capacity for Channel Known at the Transmitter 


The log-det formula of (E.13) for the ergodic capacity of a MIMO flat-fading channel 
assumes that the channel state is only known at the receiver. What if the channel state is 
also known perfectly at the transmitter? Then the channel state becomes known to the 
entire system, which means that we may treat the channel matrix H as a constant. Hence, 
unlike the partially known case treated in Section E.l, there is no longer the need for 
invoking the expectation operator in formulating the log-det capacity. Rather, the problem 
becomes one of constructing the optimal R s (i.e., the correlation matrix of the transmitted 
signal vector s) that maximizes the ergodic capacity. To simplify the construction 
procedure, we consider a MIMO channel for which the number of elements in the 
receiving antenna N r and the number of elements in the transmitting antenna N t have a 
common value, denoted by N. 

Accordingly, using the assumption of additive white Gaussian noise with variance cr”, 
in the log-det capacity formula of (E.13), we get 


We can now formally postulate the optimization problem at hand as follows: 


To proceed with construction of the optimal R s , we first use the determinant identity : 

det(I + AB ) = det(I + BA) 

Application of this identity to (E.14) yields 
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Diagonalizing the matrix product HU by invoking the eigendecomposition of a 
Hermitian matrix, we write 


Ut(HtH)U = A 

where A is a diagonal matrix made up of the eigenvalues of H H , and U is a unitary 
matrix whose columns are the associated eigenvectors. We may therefore rewrite (E. 1 8) 
in the equivalent form 


HtH = UAUt 

where by definition we have used the fact that the matrix product UU' is equal to the 
identity matrix. Substituting (E.18) into (E.17), we get 


C = log J det 


f 


1 


\ 


^-rRsUAut 

V y 


Next, applying the determinant identity of (E.16) to the formula, we get 

r r 

C = log 2 i det 


I w+ 4AUtR s U 


= log 2 det. 


c \ 

I A' + ^ A Rs 

^ n J 


I bits/(sHz) 


where 


R s = UtR s U 

Note that the transformed correlation matrix R s is nonnegative definite. Since UU"'' = I, 
we also have 


tr[R s ] = tr[UtR g U] 

= tr[UUtRJ 
= tr[R s ] 

where, in the second line, we used the equality tr[AB] = tr[BA]. It follows, therefore, that 
maximization of the ergodic capacity of (E.21) can be carried out equally well over the 
transformed correlation matrix R s . 

One other important point to note is that any nonnegative definite matrix A satisfies the 
Hadamard inequality 


det(A )<Y\a kk 

k 

where the are the diagonal elements of matrix A. Hence, applying this inequality to the 
determinent term in (E.21), we may write 
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det 


lAt + -T AR sl - 
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N 
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kk 
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where X k is the Ath eigenvalue of the matrix product HH ' and f kk is the kth diagonal 
element of the transformed matrix R s . Equation (E.25) holds only when R s is a diagonal 
matrix, which is the very condition that maximizes the ergodic capacity C. 

To proceed further, we now use (E.21) and (E.25) with the equality sign to express the 
ergodic capacity as 


C = log 


N 

n 


1+ 


tV» 


kk 


N 


Z lo g- 


k = 1 


^ 2 ^ k r s , kk 

V <7“ 
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Z lQ g 


2l h 


f ^ 

,-i 1 _ 

h + ~ r s,kk 
V cr J 


k = 1 

N N f _i i > 

+ ~~ r s,kk 
V cr y 
*= 1 A = 1 w 


Z log 2^ + Z log : 


where only the second sum term is clearly adjustable through r kk . We may therefore 
reformulate the optimization problem at hand as follows: 


The global power constraint of (E.27) follows from (E.23) and the trace definition of a 
trace: 
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The solution to the reformulated optimization problem that was initiated after (E.14) may 
be determined through the discrete spatial version of the water-filling procedure, which is 
described in Chapter 5. Effectively, the solution to the water- filling problem says that, in a 
multiple-channel scenario, we transmit more signal power in the better channels and less 
signal power in the poorer channels. To be specific, imagine a vessel whose bottom is 
defined by the set of N dimensionless discrete levels 

1 h Li 


and pour “water” into the vessel in an amount corresponding to the total transmit power P. 
The power P is optimally divided among the N eigenmodes of the MIMO link in 
accordance with their corresponding “water levels” in the vessel, as illustrated in Figure 
E.l for a MIMO link with N = 6. The “water-fill level,” denoted by the dimensionless 
parameter // and indicated by the dashed line in the figure, is chosen to satisfy the 
constraint of (E.27). On the basis of the spatially discrete water-filling picture portrayed in 
Figure E.l, we may now finally postulate the optimal r kk to be 


r skk 


2 \ + 


P- 


A, 


k = 1, 2, ...,N 


The superscript “+” applied to the right parenthesis in (E.29) signifies retaining only those 
terms in the right-hand side of the equation that are positive (i.e., the terms that pertain to 
those eigenmodes of the MIMO link for which the water levels lie below the constant /j). 



Water-filling interpretation of the optimization procedure. 
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We may thus finally state that if the channel matrix H is known to both the transmitter 
and the receiver of a MIMO link with N r = N t = N, then the maximum value of the 
capacity of the MIMO link is defined by 

C = 


where, as stated previously, the constant ju is chosen to satisfy the global power constraint 
of (E.27). 

Notes 


1 . The first detailed derivation of the log-det capacity formula for a stationary MIMO channel was 
presented by Telatar in an AT&T technical memorandum published in 1995 and republished as a 
journal paper (Telatar, 1999). 

2. Given a complex- valued matrix A, the eigendecomposition of A is defined by LhAU = A . 
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Interleaving 

Previous chapters of the book, going back to Chapter 5, have shown us how a digital 
wireless communication system can be separated by function into source-coding and 
channel-coding applications on the transmitting side and the corresponding inverse 
functions on the receiving side. In Chapter 6, we also learned how analog signals can be 
converted into a digital format. The motivation behind these techniques is to minimize the 
amount of information that has to be transmitted over a wireless channel. Such 
minimization has potential benefits in the allocation of two primary resources, namely 
transmit power and channel bandwidth, available to wireless communications: 

Reducing the amount of data that must be transmitted, which usually means that less 
power has to be consumed', power consumption is always a serious concern for 
mobile units that are typically battery operated. 

Reducing the spectral (or radio-frequency) resources, which are required for 
satisfactory performance', this reduction enables us to increase the number of users 
who can share the same but limited channel bandwidth. 

Moreover, insofar as channel coding is concerned, forward error-correction (FEC) coding, 
discussed in Chapter 10, provides a powerful technique for transmitting information- 
bearing data reliably from a source to a sink across the wireless channel. 

However, to obtain the maximum benefit from FEC coding in wireless 
communications, we require an additional technique known as interleaving. The need for 
this new technique is justified on the grounds that, in light of the material presented in 
Chapter 9, we know that wireless channels have memory due to multipath fading that 
results from the arrival of signals at the receiver via multiple propagation paths of different 
lengths. Of particular concern is fast fading, which arises out of reflections from objects in 
the local vicinity of the transmitter, the receiver, or both. The term fast refers to the speed 
of fluctuations in the received signal due to these reflections, relative to the speeds of other 
propagation phenomena. Compared with transmit data rates, even fast fading can be 
relatively slow. That is, fast fading can be approximately constant over a number of 
transmission symbols, depending upon the data transmission speed and the mobile unit’s 
velocity. Consequently, fast fading may be viewed as a time-correlated form of channel 
impairment, the presence of which results in statistical dependence among continuous 
(sets of) symbol transmissions. That is, instead of being isolated events, transmission 
errors due to fast fading tend to occur in bursts. 

Now, most FEC channel codes are designed to deal with a limited number of bit errors, 
assumed to be randomly distributed and statistically independent from one bit to the next. 
To be specific, in Section 10.8 on convolutional decoding, we indicated that the Viterbi 
algorithm, as powerful as it is, will fail if there are df Tee /2 closely spaced bit errors in the 
received signal, where df iee is the free distance of the convolutional code. Accordingly, in 
the design of a reliable wireless communication system, we are confronted with two 
conflicting phenomena: 

• a wireless channel that produces bursts of correlated bit errors; 

• a convolutional decoder that cannot handle error bursts. 
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Interleaving is an indispensable technique for resolving these two conflicting phenomena. 
First and foremost, however, it is important to note that for interleaving we do not need the 
exact statistical characterization of the wireless channel. Rather, we only require 
knowledge of the coherence time for fast fading, which is approximately given by (see 
( 9 . 46 )) 

~ 0-3 

^coherence ~ 9 y 

max 

where v max is the maximum Doppler shift. Consequently, we would expect an error burst 
to occupy typically a time duration equal to r ^ n • T° deal with bad situations of this 
kind in wireless communications, we do two things: 

• An interleaver (i.e., a device that performs interleaving) is used to randomize the 
order of encoded bits after the channel encoder in the transmitter. 

• A de-interleaver (i.e., a device that performs de-interleaving) is used to undo the 
randomization before the data reach the channel decoder in the receiver. 

Interleaving has the net effect of breaking up any error bursts that may occur during the 
course of data transmission over the wireless channel and spreading them over the 
duration of operation of the interleaver. In so doing, the likelihood of a correctable 
received sequence is significantly improved. In the transmitter, the interleaver is placed 
after the channel encoder; in the receiver, the de-interleaver is placed before the channel 
decoder. 

Three types of interleaving are commonly used in practice, and are discussed next. 

Block Interleaving 


In basic terms, a classical block interleaver acts as a memory buffer, as shown in Figure 
F. 1 . Data are written into this N x L rectangular array from the channel encoder in column 
fashion. Once the array is filled, it is read out in row fashion and its contents are sent to the 
transmitter. At the receiver, the inverse operation is performed: the contents of the array in 
the receiver are written row-wise with data; once the array is filled, it is read out column- 
wise into the decoder. Note that the (N,L) interleaver and de-interleaver described herein 
are both periodic with the fundamental period T = NL. 

Suppose the correlation time or error-burst-length time corresponds to L received bits. 
Then, at the receiver, we expect that the effect of an error burst would corrupt the 
equivalent of one row of the de-interleaver block. However, since the de-interleaver block 
is read columnwise, all of these “bad” bits would be separated by N - 1 “good” bits when 
the burst is read into the decoder. If N is greater than the constraint length of the 
convolutional code being employed, then the Viterbi decoder will correct all of the errors 
in the error burst. 

In practice, owing to the frequency of error bursts and the presence of other errors 
caused by channel noise, the interleaver should ideally be made as large as possible. 
However, an interleaver introduces delay into the transmission of the message signal, in 
that we must fill the N x L array before it can be transmitted. This is an issue of particular 
concern in real-time applications such as voice, because it limits the usable block size of 
the interleaver and necessitates a compromise solution. 


Block Interleaving 
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Block interleaver structure, (a) Data “read in.” (b) Data “read out.” 


Interleaving 

Figure F.2a depicts an original sequence of encoded words, with each word consisting of 
five symbols. Figure F.2b depicts the interleaved version of the encoded sequence, with 
the symbols shown in reordered positions. An error burst occupying five symbols, caused 
by channel impairment, is also shown alongside Figure F.2b. Note that the manner in 
which the encoded symbols are reordered by the interleaver is the same from one word to 
the next. 
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Interleaving example, (a) Original sequence, (b) Interleaved sequence, (c) De-interleaved 
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On de-interleaving in the receiver, the scrambling of symbols is undone, yielding a 
sequence that resembles the original sequence of encoded symbols, as shown in Figure 
F.2c. This figure also includes the new positions of the transmission errors. The important 
point to note here is that the error burst is dispersed as a result of de-interleaving. 

This example teaches us the following: 

The burst of transmission errors is only acted upon by the de-interleaver. 

Insofar as the encoded symbols that are received are error free, the de-interleaver 
cancels the scrambling action of the interleaver. 


Convolutional Interleaving 


The block diagram of a convolutional interleaver/de-interleaver is shown in Figure F.3. 
Defining the period 

T = LN 

the interleaver is referred to as an (L x AO convolutional interleaver , which has properties 
similar to those of the (L x N) block interleaver. 

The sequence of encoded bits to be interleaved in the transmitter is arranged in blocks 
of L bits. For each block, the encoded bits are sequentially shifted into and out of a bank of 
N registers by means of two synchronized input and output commutators. The interleaver, 
depicted in Figure F.3a, is structured as follows: 

The zeroth shift register provides no storage; that is, the incoming encoded symbol 
is transmitted immediately. 

Each successive shift register provides a storage capacity of L symbols more than 
the preceding shift register. 

Each shift register is visited regularly on a periodic basis. 

With each new encoded symbol, the commutators switch to a new shift register. The new 
symbol is shifted into the register and the oldest symbol stored in that register is shifted 
out. After finishing with the (N - l)th shift register (i.e., the last register), the commutators 
return to the zeroth shift register. Thus, the switching/shifting procedure is repeated 
periodically on a regular basis. 

The de-interleaver in the receiver also uses N shift registers and a pair of input/output 
commutators that are synchronized with those in the interleaver. Note, however, the shift 
registers are stacked in the reverse order to those in the interleaver, as shown in Figure 
F.3b. The net result is that the de-interleaver in the receiver performs the inverse operation 
to interleaving in the transmitter, and so it should. 

An advantage of convolutional over block interleaving is that in convolutional 
interleaving the total end-to-end delay is L(N - 1) symbols and the memory requirement is 
L(N - 1)/2 in both the interleaver and de-interleaver, which are one-half of the 
corresponding values in a block interleaver/de-interleaver for a similar level of 
interleaving. 

The description of the convolutional interleaver/de-interleaver in Figure F.3b is 
presented in terms of shift registers. The actual implementation of the system can also be 


Random Interleaving 
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accomplished with a random access memory unit in place of shift registers. This 
alternative implementation simply requires that access to the memory units be 
appropriately controlled. 

Random Interleaving 


In a random interleaver, a block of N input bits is written into the interleaver in the order 
in which they are received, but they are read out in a random manner. Typically, the 
permutation of the input bits is defined by a uniform distribution. Let n(i) denote the 
permuter location of the ;th input bit, where i = 1,2, . N. The set of integers denoted by 
{tr(i)}f = p defining the order in which the stored input bits are read out of the 
interleaver, is generated according to the following two-step algorithm: 

Choose an integer ;'| from the uniformly distributed set si ={ 1, 2, N], with the 

probability of choosing ij being p{i\) = 1 IN. The chosen integer q is set to be 7t(i). 
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For k > 1 , choose an integer ip from the uniformly distributed set 

4 = {i e 4 i 2 » 4 - 1 ) 

with the probability of choosing ip being p(ip) = l/(iV — k + 1). The chosen integer ip. 
is set to be tr(k). Note that the size of the set sip is progressively reduced for k> 1. 
When k = N, we are left with a single integer i N , in which case i N is set to be n(N). 
To be of practical use in communications, random interleavers are configured to be 
pseudo-random , meaning that within a block of N input bits the permutation is random as 
described above, but the permutation order is exactly the same from one block to the next. 
Accordingly, pseudo-random interleavers are designed off-line ; they are of particular 
interest in the construction of turbo codes, discussed in Chapter 10. 

Notes 


1 . Interleaving of both the block and convolutional types is discussed in some detail in Clark and 
Cain (1981) and in lesser detail in Sklar (2001). For a treatment of interleaving viewed from the 
perspective of turbo codes, see the book (Vucetic and Yuan, 2000). 


The Peak-Power Reduction Problem 
in OFDM 


In Section 9.11 we discussed the multicarrier transmission technique, namely orthogonal 
frequency-division multiplexing (OFDM), which is of particular importance to wireless 
communications due to the computational benefits offered by the fast Fourier transform 
(FFT) algorithm. However, envelope variations are a frequently cited drawback of OFDM 
because of the peak-power limited problem. This problem arises due to the statistical 
possibility of a large number of independent subchannels in the OFDM becoming 
constructively superimposed, thereby resulting in high peaks. In the literature, the 
practical issue of envelope variations is described in terms of the peak-to-average power 
ratio, commonly abbreviated as PAPR. 

In this section, we discuss the PAPR problem in wireless communications and how it 
can be reduced. 

PAPR Properties of OFDM Signals 


Consider a single modulation interval, that is, a single symbol of OFDM, the duration of 
which is denoted by T s . In its most basic form, the transmitted OFDM signal is described by 


where the term A/ denotes the frequency separation between any two adjacent 
subchannels in the OFDM. By definition, the frequency separation A/ and symbol 
duration T s are related by the time-bandwidth product: 


This condition is required to satisfy the orthogonality requirement among the N 
subchannels of the OFDM. 

Typically, the coefficients in OFDM, denoted by s n in (G.l) are taken from a fixed 
modulation constellation, exemplified by M - ary phase- shift-keying (PSK) or M - ary 
quadrature amplitude modulation (QAM) techniques, which were discussed in Chapter 7. 
With s{t), in its baseband form, being a complex- valued signal with an amplitude and 
phase that characterize it, we may express the time-averaged power of an individual 
symbol of the OFDM signal in (G.l) as follows: 


N- 1 


s(t) = Y, s n exp(]2nnAft), 0 <t<T f 


T s Af = 1 
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N- 1 
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where the summation in the second line of the equation follows for Parseval’s theorem, 
discussed in Chapter 2. With the OFDM coefficient s n being a random variable, which it is 
in a wireless environment, it follows that the time-averaged power P is itself a random 
variable. It follows therefore that the ensemble-averaged power of the OFDM signal is 
given by the expectation 

p w = tin 

= E[|.v(r)| 2 ] for 0<t<T s 

In an OFDM signal based on M- ary PSK, for example, we have ,v (j = 1 for all n. In this 
special case, (G.4) yields 

P = N 

av 


As pointed out previously, the metric of interest commonly used in the literature for 
assessing the issue of statistical peak-power variations in the use of OFDM for wireless 
communications is the peak-to-average power ratio (PAPR), for which we offer the 
following definition: 


I = 


max KOI 

o<f<r 1 v 71 


max \s(t)\ 

0 <t<T 1 v 71 

s 

E[|s(0| 2 


where, in words, the term in the numerator denotes the maximum value of the 
instantaneous power (i.e., peak power) of the OFDM signal measured across the symbol 
interval, 0 <t<T s , and the denominator denotes average power, hence PAPR. The 
formula used in (G.6) refers to the baseband formulation of the PAPR problem. 

Recognizing that PAPR is, in reality, a random variable distributed across each OFDM 
symbol, a statistical interpretation of it is useful. To this end, we may express the 
probability of the event that an OFDM symbol, denoted by s(t) as defined in (G.l), 
exceeds the peak value with probability P c as follows: 

= p c 

To expand on this definition, we say that the PAPR is less than some prescribed value c p 
for 100( 1 - P ) of the OFDM symbols, in which case we may refer to 100( 1 - P ) as a 
percentile PAPR. 


Maximum PAPR in OFDM Using M - ary PSK 


Consider an OFDM system based on M- ary PSK for its modulation scheme. For this 
special application of OFDM, the PAPR is always less than or equal to N, where N is the 
number of subchannels. To justify this statement, we first note that for M- ary PSK, 

|s n | = 1 for 1 < n < N 


Clipping-Filtering: A Technique for PAPR Reduction 
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Hence, PAPR is lower banded as follows: 

£>i 

For the upper band on the PAPR under M - ary PSK, we may write: 
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We may therefore go on to say: 


PAPR for OFDM Using M-ary PSK 

Consider the example of an OFDM system using M - ary PSK for which M = 8; that is, the 
number of subchannels is 

N = 2 8 = 256 

For such an OFDM system, the upper bound on the PAPR, expressed in decibels, can be as 
high as the value 

10 log 10 (256) = 10 x 8 x log 10 (2) 

= 10x8x3.01 
= 24.08 dB 

N 

The possibility that the PAPR attains such an upper band is inversely proportional to 2 , 
where N is the number of subchannels. It follows that, fortunately in practice, the 
probability that the upper bound in (G.10) is attained is negligibly small when N is large 
(Tellambura and Friese, 2006). 


Clipping-Filtering: A Technique for PAPR Reduction 


From the discussion just presented, we clearly see the need for reducing the PAPR for 
commercial viability of OFDM in wireless communications. 
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Considering the nature of envelope variations in the OFDM signal s(t) (these are 
responsible for the PAPR problem), an obvious approach for addressing this problem is to 
do the following: 

• First, clip s(t), such that its envelope is limited to a certain desired maximum value. 

• Second, use a linear filter so as to reduce the distortion produced by the clipping. 

A system configuration for this PAPR-reduction scheme may proceed as follows: the 
OFDM modulator constitutes the first functional block of the system, followed by the 
envelope-peak clipper, then a linear filter, and finally an up-converter for translating the 
complex baseband signal into a real-valued RF signal ready for transmission over the 
wireless channel. 

For the clipping, we may consider two types of nonlinear devices: complex baseband 
hard clipper and high-power transistor amplifier. Now, when a modulated signal is passed 
through a nonlinear device, two forms of distortion arise, namely 

• amplitude modulation-to-phase modulation (AM/PM) conversion and 

• amplitude modulation-to-amplitude modulation (AM/ AM) conversion. 

The above-mentioned nonlinear devices are of practical interest because the AM/PM 
conversion can be eliminated almost completely through the use of a suitable pre-distorter. 
Flowever, the AM/PM conversion remains to be an issue of concern. Specifically, the 
process of AM/ AM conversion results in the production of two kinds of distortion: 

• out-of-band (OOB) distortion and 

• in-band (IB) distortion, 

which are related; in any event, they both can be viewed as another source of noise. The IB 
noise cannot be reduced by filtering and, therefore, results in a degradation of error 
performance. The OOB noise can be reduced by the filter but also causes the “regrowth” 
of some original peaks. To reduce the overall regrowth of signal peaks, we may repeat the 
operation of clipping followed by filtering. 

As mentioned previously, high peak values are extremely rare; in particular, a PAPR 
greater than 14 dB is almost impossible. Consequently, in typical wireless applications, we 
find that the use of clipping-filtering techniques can reduce the PAPR down to about 10 dB 
and yet maintain OOB noise at acceptable levels. 

Notes 


1. The discussion on the PAPR problem presented herein closely follows the chapter article in 
Tellambura and Friese (2006). Another review paper of interest is Han and Lee (2005). 

2. In the context of (G.6), strictly speaking, |i(f)| is the envelope but not the transmitted signal; as 
such, (G.5) embodies the peak-to-mean envelope power ratio (PMEPR). Nevertheless, the PAPR is 
the term commonly used in the literature. 

3. In a way, this same statement also applies to the use of discrete multitone modulation (DMT) for 
digital subscriber lines (DSLs) in baseband data transmission, which was discussed in Chapter 8. 

4. The AM/PM and AM/PM conversions in power amplifiers are considered in Appendix H. 

5. Further reduction in PAPR can be accomplished through the use of sophisticated modulation and 
coding techniques; for a discussion of these and other PAPR-reduction techniques, see Tellambura 
and Friese (2006). Unfortunately, there is no single “best” technique for solving the PAPR-reduction 
problem. 


Nonlinear Solid-State Power Amplifiers 


One of the most critical constraints imposed on the design of hand-held devices 
(terminals) in mobile radio communications is that of limited battery power. These devices 
are designed for the purpose of a certain battery life or time taken for recharging the 
battery; the corresponding electronic circuitry must therefore respect the underlying power 
budget. Moreover, a significant consumer of power in mobile radio is the transmit power 
amplifier. Attention must therefore be paid to solid-state power amplifiers in mobile radio, 
hence this appendix. 

Another point to keep in mind is that power amplifiers are inherently nonlinear, 
regardless of where they are used in the design of communication systems. In this context 
we may classify nonlinearities into one of two types: 

• low-pass or band-pass; 

• memoryless or with memory. 

In this appendix, we focus attention on band-pass nonlinearities. 

Power Amplifier Nonlinearities 


There are many amplifier designs, and they have been traditionally categorized in the 
electronics literature as Class A, Class B, Class AB, Class C, Class D, and so on, typically 
increasingly nonlinear. Although Class A is considered to be a linear amplifier, no 
amplifier is truly linear; what linearity means in this context is that the operating point is 
chosen such that the amplifier behaves linearly over the signal range. The drawback of the 
Class A amplifier is that it is power inefficient. Typically, 25% or less of the input power is 
actually converted to radio-frequency (RF) power; the power that is left is converted to 
heat and, therefore, wasted. The remaining amplifier classes are designed to provide 
increasingly improved power efficiency, but at the expense of making the amplifier 
increasingly more nonlinear. 

Figure H. 1 shows the measured gain characteristic of a solid-state power amplifier at 
two different frequencies: 1626 GHz and 1643 GHz. The curves show that the amplifier 
gain is approximately constant; that is, the amplifier is linear over a wide range of inputs. 
However, as the input level increases, the gain decreases, indicating that the amplifier is 
saturating. It can also be seen that there is a significant difference in amplifier performance 
at different frequencies. If this amplifier is operated at an average input level of -10 dBm 
with an amplitude swing of ±2 dB, then the amplifier would be considered linear. If, 
however, the input signal has an amplitude swing of ±10 dB, the amplifier would be 
considered nonlinear. The fact that the gain is not constant over all input levels means that 
the amplifier introduces amplitude distortion in the form of amplitude modulation (AM). 
Since the amplitude distortion depends upon the input level, it is typically referred to as 
AM-to-AM conversion. 

An ideal amplifier does not affect the phase of an input signal, except possibly for a 
constant phase rotation. Unfortunately, a practical amplifier behaves quite differently, as 
illustrated in Figure H.2, which shows the phase characteristic of the same power amplifier 
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-16 -14 -12 -10 -8 -6 -4-2 0 2 


Input level (dBm) 

Gain characteristic of a solid-state amplifier at two 
different operating frequencies: 1626 MHz and 1643 MHz. 


considered in Figure H.l. The fact that the phase characteristic is not constant over all 
input levels means that the amplifier introduces phase distortion in the form of phase 
modulation (PM). Since the phase distortion depends upon the input level, this second 
form of distortion is typically called AM-to-PM conversion. 



-16 -14 -12 -10 -8 -6 -4-2 0 2 

Input level (dBm) 

Phase characteristic of a nonlinear amplifier at two 
different operating frequencies: 1626 MHz and 1643 MHz. 
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An amplifier with “ideal” nonlinearity acts linearly up to a given point, whereafter it 
sets a hard limit on the input signal. This can sometimes be achieved by placing 
appropriate compensation around a nonideal amplifier. With this ideal nonlinearity, the 
phase distortion is assumed to be zero. In reality, however, we have amplitude distortion as 
well as phase distortion, as illustrated in Figure H.3. The operating point of the amplifier is 
often specified as the input back-off, defined as the root-mean-square (rms) input signal 
level Tj n rms relative to the saturation input level V in sat in decibels. That is, we define 


in, sai 

Alternatively, the operating point can be expressed in terms of the output back-off, defined as 


uui, sai 

where V out is the rms output signal and V out t is the saturation output level. In both 
(H.l) and (H.2), the closeness to saturation determines the amount of distortion introduced 
by the amplifier. 




(a) Output 
power (d B) 


Saturation 

point 


Output back-off 




T 



back-off 


Input power (dB) 


(b) Output 
phase (deg) 


o 



Input Input power (dB) 

back-off 


Characterization of post-amplifier nonlinearity, 
(a) AM-AM conversion, (b) AM-PM conversion. 
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Thus, the operating point of the amplifier can be expressed in terms of the input back- 
off (IBO), defined as the input power measured relative to the saturation input level, both 
in decibels. Alternatively, it is expressed in terms of the output back-ojf (OBO), defined as 
the output power measured relative to the saturation output level, again both in decibels. 

Nonlinear Modeling of Band-Pass Power Amplifiers 


Consider a band-pass power amplifier, producing measurable output in response to band- 
pass inputs. In practice, we typically find that characterization of the amplifier is achieved 
by performing measurements on it, and then using the measurements to formulate an 
empirically based model. 

With this empirical approach to the nonlinear modeling of the power amplifier in mind, 
let the hybrid modulated signal 

x(t ) = a(t) cos(2 nf c t+0(t)) 
be applied to the input of the amplifier, producing the output 

y(0 = #0(0) cos[2n :f c t + 0(t) + <fi(a(t))] 

where g(-) and <p(-) are nonlinear functions of their respective arguments. This input- 
output relationship characterization of the amplifier is justifiable provided that the 
bandwidth of the modulated signal x(t) is relatively small, compared with the bandwidth 
of the power amplifier itself. 

Equation (H.4) embodies the two basic conversion characteristics of the power 
amplifier: 

The AM-to-AM conversion, which is described by the nonlinear amplitude function 
g(a(t)) that is an odd function of the original amplitude ait). 

The AM-to-PM, which is described by the nonlinear phase function </)(a{t)) that is 
an even function of a(t). 

Thus, based on (H.4), we may construct the cascade nonlinear model of a band-pass 
amplifier, as depicted in Figure H.4. Herein, note that the AM/PM converter precedes the 
AM/ AM converter, as it should be. 

Using a well-known trigonometric nonlinearity, we may reformulate (H.4) in the 
expanded form 

y(t) = y^t) cos(2ji f c t + 0(t))-y Q (t) sin(2ji f c t+9{t)) 



x(t) = a(t) cos(2ji/ c f + 0(f) 

x\t) = a(t ) cos[(2n/ c f + 0(f) + if>fl(f)] 

y(0 = giait)) cos[(2ji/ c f + 0(f) + 0a(f)] 

Cascade nonlinear model of a band-pass power amplifier, driven 
by a hybrid-modulated input signal. 
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Quadrature nonlinear model of a band-pass power amplifier driven by a 
hybrid-modulated signal. 


For the in-phase component of the power amplifier output we have 

yi(i) = g(a(t )) cos(0(O) 

and for its quadrature component we have 

y Q (0 = g(a(t)) sin (9(t)) 

Based on this second characterization of the power amplifier given in (FI. 7), we may 
construct the quadrature nonlinear model of the amplifier, depicted in Figure H.5. With 
the availability of such a model, the road is paved for Monte Carlo simulations to study the 
nonlinear behavior of solid-state power amplifiers that are of the band-pass variety. 

Notes 


1. A model described in (Saleh, 1981) is well-suited for studying the in-phase and quadrature 
components of the output produced by a nonlinear power amplifier. 

2. For detailed discussion of band-pass nonlinearity in power amplifiers, the reader is referred to the 
book (Tranter et al. 2004). 



Monte Carlo Integration 

In a generic sense, Monte Carlo simulation is an invaluable experimental tool for tackling 
difficult problems that are mathematically intractable; but the tool is imprecise in that it 
provides statistical estimates. Nevertheless, provided that the Monte Carlo simulation is 
conducted properly, valuable insight into a problem of interest is obtained, which would 
be difficult otherwise. 

In this appendix, we focus on Monte Carlo integration, which is a special form of 
Monte Carlo simulation. Specifically, we address the difficult integration problem 
encountered in Chapter 5 dealing with computation of the differential entropy h{Y), based 
on the conditional probability density function of (5.102) in Chapter 5. 

To elaborate, we may say: 


Let W denote the difficult area over which random sampling of the differential entropy 
h( Y) is to be performed. To get around this difficulty, let V denote an area so configured 
that it incudes the area W and is easy to randomly sample. Desirably, the selected area V 
enclosed W as closely as possible for the simple reason that samples picked outside of W 
are of no practical interest. 

Suppose now we pick a total of N samples in the area V, randomly and uniformly. Then 
according to Press, et al. (1998), the basic Monte Carlo integration theorem states that a 
computed “estimate” of the integral defining the differential entropy h(Y) is given by 

1 

h( Y) * V x <h > ±V x (</i 2 > - <h> 2 )j 2 
where the average value (i.e., mean) 

N 

< h > = 

i = 1 

and the mean-square value 

N 

<h 2 > = j Y h \y t ) 

i = 1 

The }’i in (1.2) and (1.3) is the z'th sample of the random variable Y picked from the area V. 
The “plus or minus” sign in the approximate formula of (1. 1 ) should not be viewed as a 
rigorous bound. Rather, it represents a “one standard-deviation error” that results from the 
use of Monte Carlo integration. 
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Clearly, the larger we make the number of samples N, the smaller this error will be, 
resulting in a more accurate integration. However, this improvement is attained at the cost 
of increased computational complexity. 

Notes 


1 . Monte Carlo simulation derives its name from the city, Monte Carlo, Monaco, which is widely 
known for its casino gambling: a “game of chance.” 

The term “Monte Carlo” was introduced into the technical literature by von Neumann and Ulam 
during World War II. Its adoption was intended as a codeword for the secret work that was going on 
the time in Los Alamos, New Mexico, USA. 


Maximal-Length Sequences 

Basically, maximal-length sequences, also referred to in the literature as m-sequences, are 
linear cyclic codes , the generation of which is realized by using a linear feedback-shift 
register (LFSR) as discussed in Chapter 10 on error-control coding; Figure J.l is an 
illustrative example of LFSR. However, from a practical perspective insofar as this book is 
concerned, it is the pseudo-noise (PN) characteristic that befits their use in producing 
spread-spectrum signals, an issue that was discussed in Section 9.13 of Chapter 9. In short, 
a maximal-length sequence viewed as a “carrier” may be used to spread the spectrum of an 
incoming message sequence in the transmitter and despread the received signal so as to 
recover the original message signal at the receiver output. 

It is therefore apropos that we begin the discussion of maximal-length sequences in this 
appendix by discussion their basic properties, illustrated by the LFSR as the sequence 
generator. 

Properties of Maximal-Length Sequences 


Maximal-length sequences have many of the properties possessed by a truly random 
binary sequence. A random binary sequence is a sequence in which the presence of binary 
symbol 1 or 0 is equally probable. Maximal-length sequences have the following 
properties. 

Balance Property 

In each period of a maximal-length sequence, the number of Is is always one more than 
the number of Os. 

Run Property 

Among the runs of Is and of Os in each period of a maximal-length sequence, one-half the 
runs of each kind are of length one, one-fourth are of length two, one-eighth are of length 
three, and so on as long as these fractions represent meaningful numbers of runs. 


Modulo-2 

adder 


Clock 



Output 

sequence 


Maximal-length sequence generator for m = 3, where 
m is the number of flip-flops in the generator. 
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By a “run” we mean a subsequence of identical symbols (Is and Os) within one period of 
the sequence. The length of this subsequence is the length of the run. For a maximal- 
length sequence generated by a linear feedback shift register (LFSR) of length m, the total 
number of runs is (N + l)/2, where N = 2 m - 1. 


Correlation Property 

The autocorrelation function of a maximal-length sequence is periodic and binary valued. 
As mentioned previously, the period of a maximum-length sequence is defined by 


N = 2 m - 1 

where m is the length of the LFSR. Let binary symbols 0 and 1 of the sequence be denoted 
by the levels -1 and +1, respectively. Let c(t) denote the resulting waveform of the 
maximal-length sequence, as illustrated in Figure J.2a for N = 7. Henceforth, the period of 
the waveform c(t) is 

T b = NT c 


where T c is the duration assigned to binary symbol 1 or 0 in the maximal-length sequence. 
Let c(t) denote the maximal-length sequence, the autocorrelation function of which is 
defined by 


R 


.( T) J 


b -T b /2 


c(t)c(t- t) df 


where the lag rlies in the interval (~T b /2,T b /2). Applying this formula to c(t), we get 

\t\<T 


w = 


I N+1 l I 

1 --rrr-|r|, 


AT 


L N’ 


for the remainder of the period 


This result is plotted in Figure J.2b for the case of m = 3 or N = 7. 

From Fourier transform theory, covered in Chapter 2, we know that periodicity in the 
time domain is transformed into uniform sampling in the frequency domain. This interplay 
between the time and frequency domains is borne out by the power spectral density of the 
maximal-length wave c(t). Specifically, taking the Fourier transform of (J.4), we get the 
sampled spectrum 


S c (f) = - 2 S(f) + 

N N 


sine 


-1 8\f- — 
nJ V NT 


n = -oo 
n* 0 


which is plotted in Figure J.2c for m - 3 or /V = 7. As N approaches infinity, S c (f) 
approaches a continuous function of frequency/. 
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Binary sequence 0 0 

i i i 

0 

1 

0 0 

1 1 1 

0 

1 

+1 








-i 







t 


(a) 


-NT r 




(a) Waveform of maximal-length sequence for length m = 3 or period N =7 . 
(b) Autocorrelation function, (c) Power spectral density. All three parts refer to the 
output of the feedback shift register of Figure J. 1 . 
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Comparing the results of Figure J.2c for a maximal-length sequence with the 
corresponding results shown in Figure 4.12 of Chapter 4 on stochastic processes, for a 
random corresponding binary sequence, we may make two observations: 

For a period of the maximal-length sequence, the autocorrelation function R c (t) is 
somewhat similar to that of a random binary sequence. 

The waveforms of both sequences have the same envelope, sin c (JT), for their power 
spectral densities. The fundamental difference between them is that whereas the 
random binary sequence has a continuous spectral density characteristic, the 
corresponding characteristic of a maximal-length sequence is discrete, consisting of 
delta functions spaced (l/NT c ) Hz apart. 

As the shift-register length m or, equivalently, the period N of the maximal-length 
sequence is increased, the maximal-length sequence becomes increasingly similar to the 
random binary sequence. Indeed, in the limit, the two sequences become identical when N 
is made infinitely large. However, the price paid for making N large is an increasing 
storage requirement, which imposes a practical limit on how large N can actually be made 
in practical applications of spread spectrum modulation. 

Choosing a Maximal-Length Sequence 


Now that we understand the properties of a maximal-length sequence and the fact that we 
can generate it using a linear feedback shift register, the key question that we need to 
address is: 


The answer to this question is to be found in the theory of error-control codes, which is 
covered in Chapter 10. The task of finding the required feedback logic is made 
particularly easy for us by virtue of the extensive tables of the necessary feedback 
connections for varying shift-register lengths that have been compiled in the literature. In 
Table J.l we present the sets of maximal (feedback) taps pertaining to shift-register 
lengths in = 2, 3, .... 8. Note that, as m increases, the number of alternative schemes 


Maximal-length sequence of shift-register lengths 2-8 


2* [2,1] 

3* [3,1] 

4 [4,1] 

5* [5,2], [5, 4,3, 2], [5,4,2, 1] 

6 [6,1], [6,5,2, 1], [6, 5, 3,2] 

* [7,1], [7,3], [7,3,2, 1], [7,4,3, 2], [7, 6,4, 2], [7,6,3, 1], [7, 6,5, 2], 

[7. 6.5. 4. 2.1] , [7, 5, 4, 3, 2, 1] 

g [8, 4,3, 2], [8,6, 5, 3], [8, 6,5, 2], [8, 5,3,1], [8,6,5, 1], [8, 7, 6,1], 

[8. 7. 6. 5. 2. 1] , [8, 6, 4,3, 2, 1] 
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(codes) is enlarged. Also, for every set of feedback connections shown in this table, there 
is an “image” set that generates an identical maximal-length code, reversed in time 
sequence. Note also that the particular sets, identified with an asterisk in Table J.l, 
correspond to Mersenne prime length sequences, for which the period A is a prime 
number. 

Maximal-Length Code Generation 

Consider a maximal-length sequence requiring the use of a linear feedback- shift register 
of length m = 5. For feedback taps, we select the set [5,2] from Table J.l. The 
corresponding configuration of the code generator is shown in Figure J.3a. Assuming that 
the initial state is 10000, the evolution of one period of the maximal-length sequence 
generated by this scheme is shown in Table J.2, where we see that the generator returns to 
the initial 10000 after 31 iterations; that is, the period is 31, which agrees with the value 
obtained from (J.2). 

Suppose, next, we select another set of feedback taps from Table J.l, namely [5,4,2, 1]. 
The corresponding code generator is as shown in Figure J.3b. For the initial state 10000, we 
now find that the evolution of the maximal-length sequence is as shown in Table J.3. Here 
again, the generator returns to the initial state 10000 after 31 iterations, and so it should. 
But the maximal-length sequence generated is different from that shown in Table J.2. 

Clearly, the code generator of Figure J.3a has an advantage over that of Figure J.3b, as 
it requires fewer feedback connections. 


Modulo-2 

adder 



Output 

sequence 


(a) 



Output 

sequence 


(b) 

Two different configurations of feedback shift register of length 
m = 5. (a) Feedback connections [5,2]. (b) Feedback connections [5,4,2, 1], 
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Evolution of the maximal-length sequence 
generated by the feedback-shift register of Figure J.3a 
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Code generated: 0000 1 0 1 0 1 1 1 0 1 1 000 1 1 1 1 1 00 1 1 0 1 00 1 . 
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Evolution of the maximal-length sequence generated 
by the feedback-shift register of Figure J.3b 
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Code generated: 00001 10101001000101 111101100111. 
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Notes 


1. For further details on maximal-length sequences, see Golomb (1964: 1-32), Simon, et al. (1985: 
283-295), and Peterson and Weldon (1972). The last reference includes an extensive list of 
polynomials for generating maximal-length sequences. For a tutorial paper on PN sequences, see 
Sarwate and Pursley (1980). 

2. Table J.l is extracted from the book by Dixon (1984: 81-83), where feedback connections of 
maximal-length sequences are tabulated for shift-register length m extending up to 89. 


Mathematical Tables 


Trigonometric identities 

exp(±j6 l ) = cos 6? ±j sin 6* 

cos 9 = | [ exp + exp (-j 0)\ 

sin6»= jr[exp(j0) - exp(-j0)] 

sin” 0+ cos 2 0 = 1 

cos” 9— sin“0 = cos (20) 

cos” 6 = ^[1 + cos(20)] 

sin 2 0 = i[ 1 - 008 ( 26 *)] 

2sin0cos# = sin (20) 

sin (a±P) = sinacos/?+ cosasin/? 

cos(a±/7) = cosacos/?+ sinasin/? 

tan(«±A) = tana±tan ^ 

1 T tan atan /? 

sinasin/? = ^[cos(a- J3) - cos(a + /?)] 
cosacos/? = ^[cos(a-/?) + cos(a + /?)] 
sinacos/? = ;^[sin(a-/7) + sin(a + /?)] 
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Series expansions 

Taylor series 

f(x) = f(a) + t^(x-a) + f -^(x-af +■■■ + f —^(x-a) n + 
where 


f n \a) = d ”/W 
dx" 


MacLaurin series 




„ ^ /'( o) f"(0) 2 f x o) « 

/(x) = /( 0) + + -^r-T + ■•• + -^- Lx + ■ 


2 ! 


where 


/ ( "\0) = d /jW 

dx" 

Binomial series 


x = 0 


, . .« , n(n - 1)2 

1+x) = 1 + nx H x +•••, 

2 


Ini < 1 


Exponential series 

i 1 2 

expx = 1 + x + —x + • ■ ■ 

Logarithmic series 

log ( 1 +x) = X--X + -x - 

Trigonometric series 

13 15 

sinx = x - —x + — x - 


, 1 2 1 4 

cosx = 1 - -x + —x - 


. 1 3 2 5 

tanx = x + -x + —x + • • • 


. -l 13 3 5 

sin x = x + -x H x + 

6 40 


. -l 1 3 1 5 

tan x = x - -x + -x - 
3 5 


|x| < 1 


i 1 / x 2 1 / x 4 

sine x = 1- — (nx) +— (7ix) — 
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Integrals 

Indefinite integrals 

Jxsin(ax) dx = -^[sin(ax) - axcos(ax)] 

Jxcos(ax) dx = — [cos(ax) + axsin(ax)] 
a 

Jxexp(ax) dx = — exp(ax)(ax— 1) 
a~ 

fxexp(ax 2 ) dx = -^-exp(ax”) 

J 2 a 

| exp ( ax) sin (bx) dx = — — -exp(ax)[asin(fex) - bcos(bx)] 
a + b~ 

| exp ( ax) cos {bx) dx = — -exp(ax)[acos(fcx) + fosin(fox)] 


a “ + b" 


dx 


1 V bx 

■ — tan — 

2,22 ab v a 

a + b x 


x dx x a Y bx 

— tan — 


2,22 ,2 ,3 

a + b x b b 


Definite integrals 

( 

I 


xsin(ax) , 7t , ,, „ , „ 

— — - — Y dx = -exp (-ab), a > 0, b> 0 

0 b~+x 2 


[ C0 , S(aX } dx = ^-exp (-ab), a> 0, b > 0 
J 0 b +x 2b 


[ 


eos(flx) ^ _ JL[sin(a&)-a£cos(a£)], a > 0, b> 0 
0 (b -x ) 4 b 


r°° p 00 2 1 

sine (x) dx = sine (x)dx = - 

J 0 J 0 2 
f exp(-ax“) dx = - «>0 

J 0 2Va 


f x exp(-ax") dx = — a>0 

Y 4a\ja 
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Useful constants 


Physical constants 

Boltzmann’s constant 

Planck’s constant 

Electron (fundamental charge) 

Speed of light in vacuum 

Standard (absolute) temperature 

Thermal voltage 

Thermal energy kT at standard 
temperature 

1 Hz = 1 cycle/s; 1 cycle = 2n radians 
1 W = 1 J/s 

Mathematical constants 
Base of natural logarithm 
Logarithm of e to base 2 
Logarithm of 2 to base e 
Logarithm of 2 to base 10 
Pi 


it = 1.38 x 10- 23 J/K 
h = 6.626 x 10 -34 Js 
q= 1.602 x 10~ 19 C 
c = 2.998 x 10 8 m/s 
T 0 = 273 K 

Vj = 0.026 V at room temperature 
kT 0 = 3.77 x 10^ 21 J 


e = 2.7182818 
log 2 e = 1.442695 
log 2 e = 0.693147 
log 10 2 = 0.30103 
jt = 3.1415927 


Recommended unit prefixes 


10 12 

tera 

T 

10 9 

giga 

G 

10 6 

mega 

M 

10 3 

kilo 

k 

1 0 3 

milli 

m 

10 6 

micro 

m 

10^ 9 

nano 

n 

10- 12 

pico 

P 
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Conventions and Notations 


The symbol | | means the absolute value or magnitude of the complex quantity contained 
within. 

The symbol arg( ) means the phase angle of the complex quantity contained within the 
brackets. 

The symbol Ref ] means the “real part of’ and Im[ ] means the “imaginary part of" the 
complex quantity contained within the brackets. 

The natural logarithm is denoted by In. 

Logarithms to bases 2 and 10 are denoted by log 2 and log| 0 , respectively. 

The use of an asterisk as superscript denotes complex conjugate; e.g., x* is the complex 
conjugate of x. 

The symbol indicates a Fourier-transform pair, e.g., g(t)^ G{f), where a lowercase 
letter denotes the time function and a corresponding uppercase letter denotes the frequency 
function. 

The symbol F[ ] indicates the Fourier-transform operation on a time function enclosed within 
the brackets, e.g., F[g(f)] = G(/). 

The symbol F~*[ ] indicates the inverse Fourier-transform operation of a frequency function 
enclosed within the brackets, e.g., F~'[G(/)] = g(t). 

The symbol ★ denotes convolution, e.g.. 


In Chapter 10 on error-control coding, the symbol © is used in the figures, but when it comes 
to binary arithmetic, the modulo-2 addition is denoted by an ordinary plus sign throughout 
that chapter; the same statement applies to Appendix J on maximal-length codes. 

The use of subscript T 0 indicates that the pertinent function g T (t ) , say, is a periodic function 
of time t with period T 0 . 

The use of a hat over a function indicates one of two things: 

The Hilbert transform of a function; e.g., the function g(t) is the Hilbert transform of 
g ( 0 - 

The estimate of an unknown parameter, e.g., the quantity a(x) is an estimate of the 
unknown parameter a, based on the observation vector x. 

The impulse response of a linear time-invariant system is denoted by h(t), and its transfer 
function is denoted by H(f)\ the two of them, h{t) and H(f), form a Fourier-transform pair. 

The use of a tilde over a function indicates the complex envelope of a narrowband signal; 
e.g., the function g(t) is the complex envelope of the narrowband signal g{t). The exception 
to this convention is in Section 10.12, where, in the description of turbo decoding, the tilde in 
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L[(nij ) , is used to signify extrinsic information and thereby distinguish it from log-likelihood 
ratio. 

The use of subscript + indicates the pre-envelope of a signal; e.g., the function g + (t) is the 
pre-envelope of the signal g(t). We may thus write g + (t) = g(t) + jg(t) , where g(t) is the 
Hilbert transform of g(t). The use of subscript - indicates that g_ ( t ) = g(t) - ]g{t ) = g + *(t). 

The use of subscripts I and Q indicates the in-phase and quadrature components of a 
narrowband signal, a narrowband random process, or the impulse response of a narrowband 
filter, with respect to the carrier cos(2nf c t). 

For a low-pass message signal, the highest frequency component or message bandwidth is 
denoted by W. The spectrum of this signal occupies the frequency interval -W </< W and is 
zero elsewhere. For a band-pass signal with carrier frequency / c , the spectrum occupies the 
frequency intervals/ c -W </</ c + W and -f c -W </< —f. + W, so 2W denotes the bandwidth 
of the signal. The (low-pass) complex envelope of this band-pass signal has a spectrum that 
occupies the frequency interval -W </< W. 

For a low-pass filter, the bandwidth is denoted by B. A common definition of filter 
bandwidth is the frequency at which the magnitude response of the filter drops by 3 dB 
below the zero-frequency value. For a band-pass filter with mid-band frequency f c the 
bandwidth is denoted by 2 B, centered on/ c . The complex low-pass equivalent of this band- 
pass filter has a bandwidth equal to B. 

The transmission bandwidth of a communication channel, required to transmit a modulated 
signal, is denoted by Bj. 

Random variables or random vectors are uppercase (e.g., X or X) and their sample values are 
lowercase (e.g., x or x). The symbol P[ ] signifies the probability of an event enclosed within 
the brackets; for example, P[X < x] signifies the probability that the occurence of random 
variable X assumes a value equal to or less than the sample value .r. 

A vertical bar in an expression means “given that” or “conditional on”; e.g., f x (x\H 0 ) is the 
probability density function of the random variable X given that hypothesis H 0 is true. 

The symbol E[ ] means the expected value of the random variable enclosed within; the E acts 
as an operator. 

The symbol var[ ] means the variance of the random variable enclosed within. 

The symbol cov[ ] means the covariance of the two random variables enclosed within. 

The average probability of symbol error is denoted by P e . 

In the case of binary signaling techniques, p l0 denotes the conditional probability of error 
given that symbol 0 was transmitted, and p Ql denotes the conditional probability of error 
given that symbol 1 was transmitted. The a priori probabilities of symbols 0 and 1 are 
denoted by p 0 and p j, respectively. 

The symbol ( ) denotes the time average of the sample function enclosed within. 

Boldface letter denotes a vector or matrix. The inverse of a square matrix R is denoted by R“ 
1 . The transpose of a vector w is denoted by w T . The Hermitian transpose of a complex- 
valued vector x is denoted by x^; Hermitian transposition involves both transposition and 
complex conjugation. 

The length of a vector x is denoted by || x ||. The Euclidean distance between the vectors x ( - 
and Xj is denoted by djj = || x ( - - Xy ||. 
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The inner product of two real- valued vectors x and y is denoted by x T y; their outer product is 
denoted by xy T . If the vectors x and y are complex valued, their inner product is x^y, and their 
outer product is xy^. 

In set theory, the symbols u and n stand for the union and intersection, respectively, of two 
random variables A and B, for example. 

The symbol A c stands for the complement of random variable A. 

In stochastic processes theory, M^ty, t 2 ) stands for the autocorrelation of a stochastic 
process X(t ) sampled at times t\ and t 2 when no conditions are imposed on X(t). In the special 
case of a weakly (wide-sense) stationary process X(t), the autocorrelation function is denoted 
by Rxx ( T ) f° r some time shift T, and sometimes this symbol is simplified to Rx(t); the time 
shift T is also referred to as delay. Similar notations are used for cross-correlation, namely 
Mxx(t[, t 2 ) for a pair of generic stochastic processes X(t) and Y(t), and R X y( t) for the special 
case of two weakly (wide-sense) processes. 

In information theory, the symbol H(S) denotes the entropy of a discrete event S. For a 
continuous random variable denoted by X, the symbol h(X) is used to denote its differential 
entropy. 

Given a pair of continuous random variables X and Y, their mutual information is denoted by 
I(X ; Y). 

Channel capacity is denoted by C. 

In error-control coding, the code rate is denoted by r. 

The syndrome in decoding of linear block codes is denoted by S. 

In convolutional codes, the symbol L(m / |r ; ) is used to denote the log-likelihood ratio of 
the message vector my given the received vector Tj at time-step j. 

For MAP (maximum a posteriori) decoding, the following symbols are used: 

• The L-value denotes a log-likelihood ratio of two conditional probabilities, the 
numerator pertaining to binary symbol 1 and the denominator pertaining to binary 
symbol 0. 

• LJnij) denotes the a priori L-value at time-step j of the decoding algorithm for message 
bit nij. 

• Lp(mj) denotes the a posteriori L-value at time-step j of the decoding algorithm for 
message bit mj. 

• L c denotes the transmission reliability factor. 

• The symbols a -(s), Jj(s, s'), and fjj + j(j') denote the forward metric for state S at time- 
step j, the transition metric for going from state s' to s at time-step j, and the backward 
metric for state i' at time-step j + 1, respectively. 

Lastly and rather importantly: to avoid confusion in the use of italics throughout the book, d 
is used to denote a differential and j is used to denote the square root of -1. 
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Functions 


Rectangular functions: 


rect(f) 


1 1 

2 2 


\t\ > ; 


Unit-step function: 

f 1, f > 0 

u(t) = J. 

{ 0, t< 0 

Signum function: 

sgn(0 = ■ 

1, r > 0 

0, t = 0 

-1, r < 0 

(Dirac) delta function: 

5(t) = 0, 

t* 0 


- 00 



f 5(1) dr 

= 1 


-CO 


or, equivalently. 

r® 

g(t)5(t-t 0 ) dt = g(t 0 ) 


-00 


Sine function: 

sinc(jc) = 

sin(nx) 

nx 

Sine integral: 

Si(n) = [ 
J ( 

sinx , 
d,r 

(2-function: 

Q(u) = — ]= f exp(-Vjd( 

Binomial coefficient: 

- 

n\ 


( n 

- k)\k\ 

Bessel function of the first 
kind of order n: 

1 r K 

J n (x) = — J exp (jx sin 6- 

Modified Bessel function of 
the first kind of zero order: 

1 r n 

/ 0 w = ~J exp(,rcos(?) 


Abbreviations 


ADC 

analog-to-digital converter 

ADM 

adaptive delta modulation 

ADPCM 

adaptive differential pulse-code modulation 

ADSL 

asymmetric digital subscriber line 

AM 

amplitude modulation 

APP 

a posteriori probability 

ASK 

amplitude-shift keying 


Abbreviations 


AWGN 

BCJR 

BER 

BPF 

BSC 

cdf 

CDM 

CDMA 

codec 

CPFSK 

CW 

DAC 

dB 

dBW 

dBmW 

DC 

DEM 

DFT 

DM 

DMT 

DPCM 

DPSK 

DSB-SC 

DS/BPSK 

DSE 

DTV 

exp 

FFT 

FIR 

FM 

FSK 

OMSK 

Hz 

IDFT 

IF 

IFFT 

HR 

I/O 

ISI 

EDM 

FFSR 

FMS 

In 

l°g2 
l°g 10 

FPC 


additive white Gaussian noise 

Bahl, Cocke, Jelinek, and Raviv (algorithm) 

bit error rate (chart) 

band-pass filter 

binary symmetric channel 

cumulative distribution function 

code-division multiplexing 

code-division multiple access 

coder/decoder 

continuous-phase frequency-shift keying 
continuous wave 
digital-to-analog converter 
decibel 

decibel referenced to 1 W 
decibel reference to 1 mW 
direct current 
demodulator 

discrete Fourier transform 
delta modulation 
discrete multitone 
differential pulse-code modulation 
differential phase-shift keying 
double sideband-suppressed carrier 

direct sequence/binary phase-shift keying (for spread spectrum signals) 
digital subscriber line 
digital television 

exponential, e.g., e A is written as exp(jc); both are used interchangeably 

fast Fourier transform (algorithm) 

finite-duration impulse-response (filter) 

frequency modulation 

frequency-shift keying 

Gaussian filtered MSK 

hertz 

inverse discrete Fourier transform 

intermediate frequency 

inverse fast Fourier transform (algorithm) 

infinite-duration impulse response (filter) 

input/output 

intersymbol interference 

linear delta modulation 

linear finite-shift register 

least-mean-square (algorithm) 

natural logarithm 

logarithm to base 2 

logarithm to base 10 

linear predictive coding (model) 
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LPF 

MAP 

ML 

mmse 

modem 

ms 

P s 

nm 

NRZ 

OFDM 

OFDMA 

OOK 

PAM 

PAPR 

PCM 

pdf 

PG 

PSK 

QAM 

QPSK 

RC 

RF 

rms 

RS 

RSC 

RZ 

s 

SIR 

SNR 

SRRC 

TCM 

TDL 

TV 

UHF 

UMTS 

V 


w 



n 

-l 

n 


low-pass filter 

maximum a posteriori (probability) 

maximum likelihood 

minimum mean-square error 

modulator-demodulator 

millisecond 

microsecond 

nanometer 

nonreturn-to-zero 

orthogonal frequency-division multiplexing 
orthogonal frequency-division multiple access 
on-off keying 

pulse-amplitude modulation 

peak-to-average power ratio 

pulse-code modulation 

probability distribution function 

processing gain 

phase-shift keying 

quadrature amplitude modulation 

quadriphase-shift keying 

raised cosine (spectrum) 

radio frequency 

root mean- square 

Reed-Solomon (code) 

recursive systematic convolutional (code) 

return-to-zero 

second 

signal-to-interference ratio 
signal-to-noise ratio 
square-root raised cosine (spectrum) 
trellis-coded modulation 
tapped-delay line (filter) 
television 

ultrahigh frequency 

Universal Mobile Telecommunication System 

volt 

watt 

characteristic function of random variable X with sample value * 

interleaver 

de-interleaver 
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NEXT-dominated channel, 
252-253 

Channel-coding theorem 

binary symmetric channels, 
234-235 

introduction, 232-234 
repetition code, 235-236 


Characteristic function, 112-113 
Chi-square distribution, A1-A3 
Clipping-filtering, A37-A38 
Coantenna interference (CAI), 
546-547 

Code division multiple access, fading 
channels 

Gold codes, correlation properties, 
563-564 

Gold sequences, 562-563 
introduction, 560-561 
Walsh-Hadamard sequences, 
561-562 

Coding, history of, 1-2 
Coherent detection of AWGN 
channel signaling 
correlation receiver, 341-342 
matched filter receiver, 342-343 
maximum likelihood decoding, 
337-341 

Coherent detection of binary FSK 
error probability, 378-380 
generation and detection, 377-378 
power spectra, 380-382 
Coherent detection of FSK 

bandwidth efficiency, M-ary FSK 
signals, 396-397 
introduction, 375-377 
M-ary FSK, introduction, 

395-397 

M-ary FSK versus M - ary PSK, 
398-399 

minimum shift keying, 382-383 
phase trellis, 383-384 
power spectra, M-ary FSK signals, 
396 

Coherent detection of FSK, MSK 
error probability, 390-391 
Gaussian filtering, 392-395 
generation and detection, 389-390 
power spectra, 391-392 
signal-space diagram, 384-388 
waveforms, 388-389 
Coherent detection of optimum 

AWGN receivers. See also FSK 
(frequency-shift keying) 
coherent detection; PSK (phase- 
shift keying) coherent detection, 
correlation receiver, 341-342 
matched filter receiver, 342-343 
maximum likelihood decoding, 
337-341 
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Colored noise channels, information 
capacity 

capacity of NEXT-dominated 
channel, 252-253 
introduction, 248-252 
Communication process 

digital communication, 9-1 1 
introduction, 2-4 
multiple-access techniques, 4—5 
networks, 6-9 

Composite hypothesis testing, 
132-133 

Compound probabilistic codes, 
introduction, 644-645 
Compound probabilistic codes, 

LDPC codes 
(10, 3, 5) codes, 669-671 
introduction, 646-669 
irregular codes, 674—675 
minimum distance, 67 1-672 
probabilistic decoding, 672-674 
Constrained optimization problem, 
484-487 

Convolutional codes. See Error- 

control coding for convolutional 
codes. 

Convolutional interleaving, A32-A33 
Cosine transformation of a random 
variable, 109-112 
Cross-correlation functions, 
autocorrelation function, 
155-157 

Cross-spectral densities property, 
weakly stationary stochastic 
processing, 172-174 
Cyclic codes, error-control coding 
calculating the syndrome, 

598-599 

cyclic property, 593 
encoding, 597-598 
generator matrices, 596-597 
generator polynomials, 594-595 
hamming codes, 599-603 
introduction, 593-594 
linearity property, 593 
maximal-length codes, 603-604 
parity-check matrices, 596-597 
parity-check polynomials, 
595-596 
properties, 593 

Reed-Solomon codes, 604-605 


Decision-feedback equalization, 
473-474 

Delta modulation (DM). See DM 
(delta modulation). 

DFT (discrete Fourier transform). See 
also IDFT (inverse discrete 
Fourier transform), 
binary sequence for energy 
calculation, 19-21 
Dirac delta function, 28-33 
interpreting, 70-72 
introduction, 16-19 
linear time-invariant systems, 
37^11 
pairs, 24 

periodic signals, 34—36 
theorems, 23 
time functions, 24 
unit Gaussian pulse, 21-22 
DFT (discrete Fourier transform), 
DMT systems 
description, 489^-91 
DFT-based DMT systems, 
492-493 

DMT-based DSL, practical 
applications, 493-494 
frequency-domain channel 
descriptions, 491-492 
introduction, 487^189 
DFT (discrete Fourier transform), 
numerical computation 
computing the IDFT, 77-78 
FFT algorithms, 72-77 
interpretation of DFT and IDFT, 
70-72 

introduction, 69-70 
DFT-based DMT systems, 492-493 
Differential entropy 

mutual information, 237-240 
uniform distribution, 238-240 
Differential phase-shift keying 

(DPSK). See DPSK (differential 
phase-shift keying). 

Differential pulse-code modulation 
(DPCM). See DPCM 
(differential pulse-code 
modulation). 

Digital communication introduction, 
9-11 

Digital subscriber lines (DSL). See 
DSL (digital subscriber lines). 


Dirac delta function, 28-33 
Discrete Fourier transform (DFT). 
See DFT (discrete Fourier 
transform). 

Discrete memoryless channels 
binary symmetric channel, 225 
introduction, 223-225 
Discrete memoryless channels, error- 
control coding 

channel coding theorem, 580-581 
introduction, 579-580 
notation, 582 

Discrete multicarrier transmission 
(DMT). See DMT (discrete 
multicarrier transmission). 
Distortionless baseband data 
transmission, 450-454 
Distribution functions, Bernoulli 
random variable, 101-105 
DM (delta modulation) 
adaptive DM, 308 
introduction, 305 
quantization errors, 307-308 
receiver, 307 
transmitter, 305-307 
DMT (discrete multicarrier 

transmission) system, DFT 
description, 489^-91 
DFT-based DMT systems, 
492-493 

DMT-based DSL, practical 
applications, 493-494 
frequency-domain channel 
descriptions, 491-492 
introduction, 487-489 
DMT (discrete multicarrier 

transmission) system, loading, 
482-484 

DPCM (differential pulse-code 
modulation) 

DPCM receiver, 303 
DPCM transmitter, 303 
introduction, 301-303 
processing gain, 304 
DPSK (differential phase-shift 

keying). See also PSK (phase- 
shift keying), introduction, 
error probability, 412-413 
generating DPSK signals, 413 
illustration, 412 
introduction, 411-412 
optimum receiver, 413-415 
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DSB-SC modulation, 60-61 
DSL (digital subscriber lines) 
band-limited channels, 475^177 
DMT-based, practical 
applications, 493-494 

Entropy 

Bernoulli random variable, 
211-212 

differential, 237-240 
of extended source, 213-214 
extension of a discrete 

memoryless source, 212-213 
introduction, 207-209 
properties of, 209-2 1 1 
relative, 238-239 
Envelopes 

band-pass signals, complex 
envelopes, 47-49 
low-pass signals, 47 
narrowband noise, 191-193 
pre-envelopes, 45-47 
Equal gain combining, 538 
Ergodic processes, weakly stationary 
stochastic processing, 157-158 
Error rates in band-limited channels 
due to channel noise in matched- 
filter receivers, 446-447 
Error-control coding. See also 

Compound probabilistic codes, 
forward error correction, 578-579 
introduction, 577-578 
LDPC codes. See LDPC (low- 
density parity-check) codes. 
Error-control coding, exit charts 
approximate Gaussian model, 
661-663 

developing, 658-661 
histogram computation method, 
663-666 

introduction, 657-658 
measuring, 664-666 
Error-control coding, turbo coding 
extrinsic information, 649-650 
introduction, 645-646 
mathematical feedback analysis, 
651-653 

performance, 648-649 
serial concatenated codes, 

681-687 

turbo decoder, 650-651 
two-state encoder, 646-648 


UMTS turbo decoder, 653-657 
UMTS with binary PSK 
modulation, 653-657 
Error-control coding for 
convolutional codes 
code tree, 607-608 
convolutional encoder, 606-607 
introduction, 605-606 
optimum decoding, 613-614 
recursive systematic, 611-613 
state diagrams, 609-611 
trellis graph, 609. See also 
Trellis-coded modulation. 
Error-control coding for 

convolutional codes, maximum 
a posteriori probability decoding 
algorithmic metrics, 627-628 
AWGN channel, branch metric 
evaluation, 630-634 
BCJR algorithm, 623-624, 638 
forward-backward recursions, 
626-630 

introduction, 623-624 
lattice-based framework for the 
derivation, 625-626 
log-MAP algorithm, 636-638 
MAP decoding algorithm, 
624-625, 635-638 
max-log-MAP algorithm, 
636-638, 639-644 
a posteriori L- value, finalizing, 
634 

Error-control coding for 

convolutional codes, maximum 
a posteriori probability max- 
decoding, 636-638 
Error-control coding for 

convolutional codes, maximum 
likelihood decoding 
asymptotic coding gain, 622-623 
correct decoding of received all- 
zero sequences, 617-618 
free distance, 620-621 
incorrect decoding of received all- 
zero sequences, 619 
introduction, 614-616 
Viterbi algorithm, 616-617, 623 
Error-control coding for cyclic codes 
calculating the syndrome, 

598-599 

cyclic property, 593 
encoding, 597-598 
generator matrices, 596-597 


generator polynomials, 594-595 
hamming codes, 599-603 
introduction, 593-594 
linearity property, 593 
maximal-length codes, 603-604 
parity-check matrices, 596-597 
parity-check polynomials, 
595-596 
properties, 593 

Reed-Solomon codes, 604-605 
Error-control coding for discrete 
memoryless channels 
channel coding theorem, 580-581 
introduction, 579-580 
notation, 582 

Error-control coding for linear coding 
blocks 

hamming codes, 590-592 
introduction, 582-585 
minimum distance considerations, 
587-589 

syndrome decoding, 589-590 
syndrome definition and 
properties, 585-587 
Exit charts 

approximate Gaussian model, 
661-663 

developing, 658-661 
histogram computation method, 
663-666 

introduction, 657-658 
measuring, 664-666 
Expectation 

introduction, 105-106 
linearity, 107-108 
statistical independence, 108 
Exponential distribution, 1 10-1 1 1 
Eye patterns 

for binary systems, 467-469 
introduction, 463-464 
for 37-ary transmissions, 466 
peak distortion for intersymbol 
interference, 465-466 
for quaternary systems, 467-469 
timing features, 464 

Fading channels 

comparison of modulation 
schemes, 525-527 
diversity techniques, 525 
effects of flat fading, 525-527 
introduction, 501-502 
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propagation effects, 502-505 
RAKE receiver and multipath 
diversity, 564—566 
Fading channels, code division 
multiple access 

Gold codes, correlation properties, 
563-564 

Gold sequences, 562-563 
introduction, 560-561 
Walsh-Hadamard sequences, 
561-562 

Fading channels, FIR modeling of 
doubly spread channels 
generating tap coefficients, 
523-524 

introduction, 520-523 
practical matters, 523 
Rayleigh processes, 524 
Rician-Jakes doppler spectrum 
model, 524-525 
Fading channels, Jakes model 
illustrative generation of fading 
processes, 510-511 
implemented as a FIR filter, 
509-511 

introduction, 506-509 
Fading channels, MIMO capacity 
channel known at the transmitter, 
555-556 

ergodic capacity, 551-553 
log-det formula capacity, 

553-554 

outage capacity, 554-555 
Fading channels, MIMO systems 
basic baseband channel model, 
547-551 

CAI (coantenna interference), 
546-547 

introduction, 546 
Fading channels, OFDM 
introduction, 556 
PAPR problem, 556-557 
Fading channels, space diversity-on- 
receive systems 
equal gain combining, 538 
introduction, 528 
maximum-ratio combining, 
533-537 

outage probability for maximal- 
ratio combiner, 537 
outage probability of selection 
combiner, 532 

selection combining, 528-532 


Fading channels, space diversity-on- 
transmit receive systems 
Alamouti code, 540-541 
full-rate complex code, 541 
introduction, 538-539 
linearity, 542-546 
maximum likelihood decoding, 
545-546 

QPSK (quadriphase-shift keying), 
539 

receiver considerations, Alamouti 
code, 542-545 

unitarity (complex orthogonality), 
541 

Fading channels, spread spectrum 
signals 

classification of spread spectrum 
signals, 557-558 
introduction, 557-558 
processing gain of the DS/BPSK, 
559 

Fading channels, statistical 

characterization of wideband 
wireless channels 
classification of multipath 
channels, 519-520 
Doppler power spectrum, 
517-519 

introduction, 5 1 1-512 
multipath correlation function of 
the channel, 512 
power-delay profile, 516-517 
scattering function of the channel, 
514-516 

spaced-frequency, spaced-time 
correlation function of the 
channel, 514 

uncorrelated scattering, 513 
wide-sense stationarity, 512-513 
FFT (fast Fourier transform) 
algorithms, 72-77 
Filtering two jointly weakly 
stationary processes, 174 
FIR (finite-duration impulse 
response) modeling, 
introduction, 456^-58 
FIR (finite-duration impulse 

response) modeling of doubly 
spread fading channels 
generating tap coefficients, 
523-524 

introduction, 520-523 
practical matters, 523 


Rayleigh processes, 524 
Rician-Jakes doppler spectrum 
model, 524-525 
Fourier series, 13-16 
Fourier transform. See DFT (discrete 
Fourier transform); IDFT 
(inverse discrete Fourier 
transform). 

Frequency-domain 

description, 56-58, 268-271 
relation to time-domain, 25-28 
FSK (frequency-shift keying). See 
also AWGN channel signaling, 
introduction, 375-377 
noncoherent detection of binary 
FSK, 410-411 

FSK (frequency-shift keying) 
coherent detection. See also 
PSK (phase-shift keying), 
introduction. 

bandwidth efficiency, M - ary FSK 
signals, 396-397 
M - ary FSK, introduction, 

395-397 

M-ary FSK versus M - ary PSK, 
398-399 

minimum shift keying, 382-383 
phase trellis, 383-384 
power spectra, M-ary FSK signals, 
396 

FSK (frequency-shift keying) 

coherent detection, binary FSK 
error probability, 378-380 
generation and detection, 377-378 
power spectra, 380-382 
FSK (frequency-shift keying) 
coherent detection, MSK 
error probability, 390-391 
Gaussian filtering, 392-395 
generation and detection, 389-390 
power spectra, 391-392 
signal-space diagram, 384-388 
waveforms, 388-389 
Full-rate complex code, 541 

Gaussian distribution 
introduction, 113 
jointly Gaussian random 
variables, 116 

linear function of a Gaussian 
random variable, 114 
mean, 1 14 
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Gaussian distribution (cont.) 
random variables, 239-240 
standard distribution, table of, 117 
sum of independent Gaussian 
random variables, 114 
variance, 114 
Gaussian process 
independence, 179 
introduction, 176-177 
linear filtering, 177-178 
multivariate distribution, 178 
stationarity, 179 

Geometric representation of AWGN 
channel signals 
2B1Q code, 331-332 
Gram-Schmidt orthogonalization, 
329-331 

introduction, 324-328 
Schwarz inequality, 328-329 
Gold codes, correlation properties, 
563-564 

Gold sequences, 562-563 
Gram-Schmidt orthogonalization, 
329-331 

Group delays, 66-69 

Hilbert transform 
introduction, 42-44 
low-pass signals, 44-45 
Huffman coding, lossless data 
compression, 219-220 
Huffman tree, lossless data 
compression, 220-221 
Hypothesis testing 
binary, 130-132 
composite, 132-133 
introduction, 126-129 

Ideal band-pass filtered white noise, 
189-190 

Ideal low-pass filtered white noise, 
181-182 

Ideal Nyquist pulse, band-limited 
channels, 450-454 
IDFT (inverse discrete Fourier 
transform). See also DFT. 
computing, 77 -78 
interpreting, 70-72 
Information capacity, colored noise 
channels 

capacity of NEXT-dominated 
channel, 252-253 


introduction, 248-252 
Information capacity law 

capacity of binary-input AWGN 
channels, 244-248 
implications of, 244-248 
introduction, 240-243 
PCM noise, 292-294 
sphere packing, 243-244 
Information theory, history of, 1-2 
Integrals, table of, A57 
Interleaving 

block, A30-A32 
convolutional, A32-A33 
introduction, A29-A30 
random, A33-A34 
Intersymbol interference, band- 
limited channels, 447-449 
Inverse discrete Fourier transform 
(IDFT). See IDFT (inverse 
discrete Fourier transform). 

Jakes model, fading channels 
illustrative generation of fading 
processes, 510-511 
implemented as a FIR filter, 
509-511 

introduction, 506-509 
Jointly Gaussian random variables, 
116 

Kraft inequality, lossless data 
compression, 217-219 

Lagrange multipliers, A19-A20 
LDPC (low-density parity-check) 
codes 

(10, 3, 5) codes, 669-671 
history of, 645 
introduction, 646-669 
irregular codes, 674-675 
minimum distance, 671-672 
probabilistic decoding, 672-674 
Least-mean-square (LMS) algorithm, 
470-472 

Lempel-Ziv coding, lossless data 
compression, 221-223 
Line codes 

bipolar RZ signaling, 311 
introduction, 309-310 
Manchester code, 311 
polar NRZ signaling, 311 
split phase, 311 


unipolar NRZ signaling, 3 1 1 
unipolar RZ signaling, 311 
Linear coding blocks, error-control 
coding 

hamming codes, 590-592 
introduction, 582-585 
minimum distance considerations, 
587-589 

syndrome decoding, 589-590 
syndrome definition and 
properties, 585-587 
Linear function of a Gaussian random 
variable, 114 
Linear modulation theory 
DSB-SC modulation, 60-61 
introduction, 58-60 
SSD modulation, 64-66 
summary of modulation methods, 
66 

VSB modulation, 61-64 
Linear time-invariant filter, 

transmitting weakly stationary 
stochastic processing, 158-160 
Linear time-invariant systems, 37-41 
Linearity, expectation, 107-108 
LMS (least-mean-square) algorithm, 
470-472 

Log-normal distribution, A3-A6 
Lossless data compression algorithms 
Huffman coding, 219-220 
Huffman tree, 220-221 
introduction, 215-216 
Kraft inequality, 217-219 
Lempel-Ziv coding, 221-223 
prefix coding, 216-217 
Low-density parity-check (LDPC). 
See LDPC (low-density parity- 
check). 

Low-pass signals 
envelopes, 47 
Hilbert transform, 44-45 

Manchester code, 311 
MAP (maximum a posteriori 

probability ) decoding algorithm, 
624-625, 635-638 
Mathematical tables 
integrals, A57 
series expansions, A56 
trigonometric identities, A55 
unit prefixes, A58 
useful constants, A58 
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Maximal-length sequences 
choosing, A50-A54 
code generation, A51-A54 
correlation property, A48-A50 
introduction, A47 
properties of, A47-A50 
Maximum likelihood decoding, 
545-546 

Maximum-ratio combining, 533-537 
Mean functions, weakly stationary 
stochastic processing, 149-157 
Mean-square value property 
autocorrelation function, 151 
weakly stationary stochastic 
processing, 164 

Method of Lagrange multipliers, 
A19-A20 

MIMO (multiple input, multiple 

output) capacity, fading channels 
channel known at the transmitter, 
555-556, A24—A28 
ergodic capacity, 55 1-553 
log-det formula capacity, 

553-554, A21-A24 
outage capacity, 554-555 
Mixing random processes with 
sinusoidal, weakly stationary 
stochastic processing, 167-169 
Monotonicity of the distribution, 99 
Monte Carlo integration, A45-A46 
m-sequences. See Maximal-length 
sequences. 

MSK (minimum shift keying), FSK 
coherent detection 
error probability, 390-391 
Gaussian filtering, 392-395 
generation and detection, 389-390 
power spectra, 391-392 
signal-space diagram, 384-388 
waveforms, 388-389 
Mutual information 

continuous random ensembles, 
237-240 

differential entropy, 237-240 
expansion, 228-229 
introduction, 226-227 
nonnegativity, 228 
symmetry, 227-228 

Nakagami distribution, A6-A9 
Narrowband noise 
envelope, 191-193 


ideal band-pass filtered white 
noise, 189-190 
introduction, 183-189 
phase components, 191-193 
plus sine wave, 193-195 
Rayleigh distribution, 192-193 
Rician distribution, 194-195 
Networks, introduction, 6-9 
NEXT-dominated channel, capacity 
of, 252-253 

Noise. See also Narrowband noise; 
White noise, 
definition, 179 
shot, 180 
thermal, 180 
Noise, PCM 

error threshold, 291-292 
information capacity law, 
292-294 

introduction, 290-291 
Noncoherent detection, binary FSK, 
410-411 

Noncoherent orthogonal modulation, 
AWGN channel signaling, 
404-410 

Nonlinear solid-state power 
amplifiers, A39-A43 
Nonnegativeness property, weakly 
stationary stochastic processing, 
164 

Nonnegativity function, 99 
Normalization function, 99-100 
Normalization property 

autocorrelation function, 152 
weakly stationary stochastic 
processing, 165 

OFDM (orthogonal frequency 

division multiplexing), PAPR 
problem 

clipping-filtering, PAPR 
reduction, A37-A38 
fading channels, 556-557 
introduction, A35 
maximum PAPR using M - ary 
PSK, A36-A37 
properties of OFDM signals, 
A35-A36 
Outage probability 

for maximal-ratio combiner, 537 
of selection combiner, 532 


PAM (pulse-amplitude modulation), 
274-277 

PAPR (peak-to-average power ratio) 
problem 

clipping-filtering, PAPR 
reduction, A37-A38 
fading channels, 556-557 
introduction, A35 
maximum PAPR using M - ary 
PSK, A36-A37 
properties of OFDM signals, 
A35-A36 

Parameter estimation 

in additive noise, 124—125 
introduction, 122-124 
Partitioning continuous-time 
channels 

geometric SNR, 481-482 
introduction, 478-481 
loading the DMT system, 
482-484 

PCM (pulse-code modulation) 
encoding the transmitter, 288 
introduction, 285-286 
inverse operations in the receiver, 
288-289 

quantization of the transmitter, 
286-288 

regeneration along the transmitter 
path, 288-290 

PCM (pulse-code modulation), noise 
considerations 
error threshold, 291-292 
information capacity law, 
292-294 

introduction, 290-291 
Periodic signals, Fourier transform, 
34-36 

Phase components, narrowband 
noise, 191-193 
Phase delays, 66-69 
Phase-shift keying (PSK). See PSK 
(phase-shift keying). 

Poisson process, weakly stationary 
stochastic processing, 174-176 
Polar NRZ signaling, 311 
Prediction-error filtering, redundancy 
reduction 

discrete time structure for 
predictions, 296-299 
introduction, 294-295 
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Prediction-error filtering, redundancy 
reduction (cont.) 
linear adaptive prediction, 

300-301 

theoretical considerations, 
295-296 

Pre-envelopes, 45-47 
Prefix coding, lossless data 
compression, 216-217 
Probabilistic compound codes. See 
Compound probabilistic codes. 
Probabilistic model, 90-97 
Probability theory 

characteristic function, 112-113 
introduction, 87-90 
probabilistic model, 90-97 
random variables, 97-98 
set theory, 88-90 
Probability theory, central limit 
theorem 

introduction, 118 
sum of uniformly distributed 
random variables, 118-119 
Probability theory, distribution 
functions 

Bernoulli random variable, 
101-105 

boundedness of the distribution, 

98 

introduction, 98 

monotonicity of the distribution, 

99 

nonnegativity, 99 
normalization, 99-100 
uniform distribution, 100-101 
Probability theory, expectation 
introduction, 105-106 
linearity, 107-108 
statistical independence, 108 
Probability theory, Gaussian 
distribution 
introduction, 113 
jointly Gaussian random 
variables, 116 

linear function of a Gaussian 
random variable, 114 
mean, 114 

standard distribution, table of, 1 17 
sum of independent Gaussian 
random variables, 114 
variance, 114 


Probability theory, second-order 
statistical averages 
cosine transformation of a random 
variable, 109-112 
exponential distribution, 110-111 
introduction, 108-109 
Processing gain, DPCM, 304 
Properties, weakly stationary 
stochastic processing 
cross-spectral densities, 172-174 
filtering two jointly weakly 
stationary processes, 174 
introduction, 160-161, 170-172 
mean-square value of stationary 
process, 164 

mixing random processes with 
sinusoidal, 167-169 
nonnegativeness, 164 
normalization, 165 
random binary wave, 166-167 
sinusoidal wave with random 
phase, 165-166 
sum of two processes, 173 
symmetry, 164 
Wiener-Khintchine theorem, 
169-170 

zero correlation among frequency 
components, 162-163 
zero-frequency value, 164 
PSK (phase-shift keying), 

introduction, 352. See also 
AWGN channel signaling; 
DPSK (differential phase-shift 
keying); FSK (frequency-shift 
keying). 

PSK (phase-shift keying), M - ary 
QAM 

average probability of error, 
373-375 

introduction, 370-371 
for M = 4, 371-373 
QAM square constellations, 371 
square constellations, 371 
PSK (phase-shift keying) coherent 
detection 

binary phase-shift keying, 
352-357 

error probability, binary PSK, 
354-356 
introduction, 352 
M-ary PSK, introduction, 

367-370 


M - ary PSK versus M - ary FSK, 
398-399 

PSK (phase-shift keying) coherent 
detection, power spectra 
binary PSK, 356-357 
M - ary PSK, 367-370 
PSK (phase-shift keying) coherent 
detection, QPSK 
error probability, 362-364 
introduction, 357-359 
offset QPSK, 365-367 
power spectra, 364-365 
signal-space diagrams, 358-359 
waveforms, 359-365 
PSK (phase-shift keying) coherent 
detection, signal-space diagrams 
binary PSK, 353-354 
QPSK signals, 358-359 
Pulse-amplitude modulation (PAM), 
274-277 

Pulse-code modulation (PCM). See 
PCM (pulse-code modulation). 

QAM (quadrature amplitude 
modulation) 

average probability of error, 
373-375 

introduction, 370-371 
M-ary QAM for M = 4, 371-373 
square constellations, 37 1 
2-function, bounds on, A1 1-A12 
QPSK (quadriphase-shift keying), 
PSK coherent detection 
error probability, 362-364 
introduction, 357-359 
offset QPSK, 365-367 
power spectra, 364-365 
signal-space diagrams, 358-359 
waveforms, 359-365 
QPSK (quadriphase-shift keying), 
space diversity-on-transmit 
receive systems, 539 
Quadrature-modulated processes, 
autocorrelation function, 
156-157 
Quantization 

errors, delta modulation, 307-308 
introduction, 278-279 
noise, 279-281 
scalar quantizers, optimality, 
282-285 

sinusoidal modulating signal, 
281-282 
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Raised cosine (RC). See RC (raised 
cosine). 

RAKE receiver and multipath 
diversity, 564-566 
Random binary wave property 

autocorrelation function, 154-155 
weakly stationary stochastic 
processing, 166-167 
Random interleaving, A33-A34 
Random processes, mixing with 
sinusoidal, 167-169 
Random variables 
Bernoulli, 101-105 
cosine transformation, 109-1 12 
probability theory, 97-98 
Random variables, Gaussian 
jointly Gaussian, 116 
linear function of, 114 
sum of independent, 114 
Rate distortion theory 

Gaussian sources, 255-256 
introduction, 253-255 
Rayleigh distribution, 192-193 
Rayleigh processes, 524 
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